Leading a multi-office strike team through six months of post-partnership-termination fallout — absorbing 3M+ players onto the global server cluster after the company's China licensing partnership ended, with Taipei as the front line of impact and 75%+ service level sustained throughout.
A surge event of this magnitude on a regional server is normally a multi-quarter capacity exercise. We had six months and one operational mandate: keep service level above 75%, with no major outages, while every team in three offices stayed in lockstep.
Following the company's decision to end a long-standing China licensing partnership, our games became unavailable on local infrastructure in mainland China. Millions of players seeking continued access migrated overnight to the closest accessible global servers — and Taipei, geographically and linguistically the closest, became the front line of impact.
Existing server clusters were provisioned for the established Taipei playerbase. Concurrent-user peaks risked queue saturation, login storms, and instance instability. Capacity needed to expand and stay stable simultaneously.
Several decisions affected the adjacent Korean server — from matchmaking pools to community-tooling rollout. Any change made in Taipei needed to be visible to and coordinated with the Korea office before it shipped.
Sudden mixing of two large player populations with different language norms, expectations, and dispute patterns created chat-channel toxicity, in-game griefing reports, and forum flame wars — all of which ate into CSAT and increased ticket volume.
The migrating cohort brought with it a more developed cheating ecosystem (third-party tools, RMT, account-sharing). Detection thresholds calibrated for the original playerbase were no longer fit for purpose — false negatives surged within weeks.
Every operational decision — bans, channel splits, server-name choices, official statements — sat against an active external news cycle on the partnership wind-down. External comms needed alignment between Live Ops, HQ Communications, PR, and regional leadership before going public.
HQ engineering was already on a planned roadmap. Surge response required out-of-cycle server provisioning, ticket-tooling adjustments, and content/data updates — pulled in alongside scheduled work without breaking quarterly OKRs.
As Strike Team Lead under Live Operations, I owned end-to-end coordination of the surge response — the connective tissue between four stakeholder groups across three offices. The mandate was simple, the execution wasn't: hold service level, hold the line on community health, and ship every fix without violating the cross-region governance every team cared about.
Four stakeholder groups, four different reasons for being at the table. The strike team's job was to keep them aligned in real-time without any one group feeling overruled.
Every operational decision — server provisioning, channel splits, comms tone, ban-wave timing — landed first on the Taipei team. They bore the daily reality of what we shipped. Their input wasn't optional; it was the calibration loop for everything else.
Decisions about matchmaking pools, anti-cheat coverage, and tooling rollouts had cross-region effects on the Korean server. Korea's leadership joined the weekly sync and any incident bridge where their region was implicated — preventing the kind of "we shipped, sorry didn't tell you" failure that erodes regional trust.
Community Lead handled forum and social-channel sentiment; PR Lead owned external statements and political-sensitivity review; IT Lead handled outage triage and incident escalation. Together they were the real-time crisis cell — and the reason a Taipei-only decision never went out without HQ alignment.
Real-time content adjustments, data expansion, and out-of-cycle server provisioning all routed through HQ Engineering. The Dev Lead helped sequence emergency work alongside the existing roadmap — converting "we need it now" into "here's the realistic landing date" without dropping either commitment.
Splitting the response into five parallel workstreams gave each stakeholder group a clear lane while still holding the whole picture together. Each pillar had a named owner, a measurable target, and a daily-update cadence into the strike-team standup.
Provision new instances and shards ahead of demand, not in response to it.
Channel and queue design that reduces high-friction encounters.
Reduce flashpoints in forums, chat, and social channels without erasing voices.
Recalibrate detection for the new attack surface and ship bans faster.
Faster, clearer, more visible from official channels — so rumor cycles lost oxygen.
Five pillars times four stakeholder groups doesn't survive on goodwill. We ran an explicit cadence that made silence impossible — every group either heard from the strike team daily or had a standing forum to surface issues.
CS, Live Ops, regional moderators. Surface incidents, set the day's priorities, agree on escalations.
Taipei + Korea + HQ Community/PR/IT/Dev. Risk register review, decisions log, sign-off on the next week's outbound comms.
Capacity trajectory, SLA performance, top-three risks with mitigation status. Escalation lane for resourcing decisions.
Cross-office channel for live incidents. Severity-tagged; predictable response time per severity level.
Quantitative targets met, but the more durable wins were structural — patterns and playbooks the company kept using long after the surge subsided.
Service level remained at 75%+ across the surge window. Login queues were managed without major-outage tickets to executive teams. CSAT dipped briefly but returned to baseline before the surge concluded.
The Korea office became a more active, more trusting partner in subsequent regional decisions. The "Taipei ships first, tells Korea later" failure mode that had previously caused friction stopped recurring.
The five-pillar / four-stakeholder structure became a reference template — re-used for later live-event surges, esports tournaments, and the early innings of the next major regional event.
Pre-approved language packs and the daily-update rhythm became the default cadence for high-visibility regional events afterward. Time from incident → official statement dropped meaningfully.