Building an enterprise-grade multilingual smart knowledge retrieval system from scratch — bridging Traditional Chinese, Simplified Chinese, English, and Korean with NLP, vector search, and an API-first architecture.
After 18 months, every quantitative target was met or exceeded. The bigger story, however, was qualitative: cross-regional teams stopped reinventing the wheel.
Rapid expansion into APAC and EMEA had created a sprawling, fragmented internal knowledge base. CS, ops, and tech teams across regions were drowning in language barriers and tagging chaos.
The legacy system supported only exact keyword matching — no tolerance for typos, homophones, or vague concepts. Average single retrieval took over 5 minutes and required multiple keyword combinations.
Taiwanese, Chinese, Korean, and English-speaking teams couldn’t search documents written by one another. A problem solved in Taiwan would be re-solved in Korea three months later — knowledge trapped in language cages.
The old architecture required extensive re-writing every time a new language was added. With European business expansion (FR / DE / ES) on the horizon, the existing system would not survive.
Knowledge was scattered across Wiki, Confluence, Jira, PDFs, spreadsheets — with conflicting metadata, garbled titles, and outdated documents misleading users every day.
As Lead PM and Product Owner, I owned every critical decision from business case through MVP launch — translating fragmented business pain into a technical roadmap and a coherent execution plan.
Authored the comprehensive business proposal and ROI model (time-saved × labor-cost × team-size). Defended the case to senior leadership over multiple rounds, securing budget and headcount.
Defined a brand-new Metadata Schema as the cornerstone of search quality. Coordinated CS, ops, and engineering to unify tagging logic, naming conventions, and unstructured-data cleansing standards.
Designed the core Fuzzy Logic strategy (character tolerance, synonym handling), multi-language weighting configuration, and detailed RESTful API contracts to enable future expansion.
Built a high-performing Scrum squad from scratch: backend, data science, frontend, and a localization team across 4 language families — and the communication rituals that kept them in sync.
We adopted an API-First, modular philosophy. Every design decision balanced current performance against future expansion — so adding a new language wouldn’t require rewriting the core.
Elasticsearch + vector database for semantic search and intelligent ranking.
TC/SC conversion, word segmentation, and synonym mapping. New languages plug in here.
RESTful, standardized contracts so any external system can integrate search capabilities.
Indexing layer normalizes content during ingestion · Search-time queries are uniformly converted before retrieval
Making fuzzy search both smart and accurate in CJK languages is its own discipline. Set the fault tolerance too loose and users drown in noise; too tight and the old problem returns.
Senior staff across language teams co-authored an internal vocabulary including company-specific terminology — so “WoW”, “World of Warcraft”, and the Traditional/Simplified Chinese variants all resolve to the same token.
Multi-level ranking strategy: exact title match (weight 10) > fuzzy title (5) > exact body (3) > fuzzy body (1). Dynamically tuned by document freshness and click-through rate.
Both indexes are created in parallel during ingestion; queries are uniformly normalized at search time, so a user typing 「滑鼠」 finds documents tagged 「鼠标」 — seamlessly.
Even the most sophisticated search engine produces garbage if the underlying data is inconsistent. Resolving the data layer was as much of the work as building the engine.
Two-week Sprint cadence within an 18-month project, divided into three clearly scoped phases — each with its own success criteria.
Stand up the core search engine for TC, SC, and English. Implement basic 2-character fault tolerance. Replace the legacy system without disrupting business continuity.
Integrate the Korean morphological analyzer and optimize cross-lingual search. Acceptance criterion: 80% adoption by the Korean team — validating the multi-language architecture.
Expose the RESTful API for downstream systems (CRM, ticketing). Fully document the API spec to prepare for European-language expansion.
A successful project depends 70% on team collaboration efficiency. We adopted a Squad model — lean but cross-functional — with rituals designed for clarity, not theater.
PM (Me) · Tech Lead · 2 Backend Engineers · 1 Frontend Engineer · 1 Data Engineer · part-time Linguistic Specialists from each language operations team. Cross-functional pairing built shared ownership across disciplines.
Stakeholders (CS Manager, Ops Manager, IT Manager) tested the latest build live and contributed “Bad Cases” — failed searches that became next sprint’s optimization targets. A positive feedback loop.
Sync on yesterday/today/blockers. Brief enough that it stayed valuable; structured enough that no one fell out of step.
Mid-project, we pre-emptively invited European IT teams (future API consumers) to walk through the API spec. This forward-looking move turned them into allies and prevented expensive rework later.
The headline numbers were the easy part. The real win was changing how cross-regional teams treated knowledge.
Best practices from Taiwan got adopted by the Korean team within a sprint. Chinese technical documents became retrievable by European and American engineers. Cross-regional collaboration efficiency lifted noticeably.
The “Search-as-a-Service” pattern became the internal reference architecture for downstream systems integrating search — laying foundation for future GenAI and knowledge-graph initiatives.
The European team integrated the French module in 2 weeks after handover (vs. 2 months estimated). The API-First philosophy paid for itself the moment real expansion happened.
Balancing technical feasibility against business value, optimizing inside resource constraints, motivating a team under pressure — and seeing what is possible when 12 people pull toward one common goal.