Korean NLP foundation · Verified

NSD-Korean-NLP-Foundation

한국어 텍스트 처리·언어 분석·담화 구조 통합 자산.

Unified Korean text processing, syntax, and discourse stack.

형태소·구문·담화·표지어·사전·구조 분석을 자체 기술로 내재화. 외부 의존 없이 단일 시스템으로 운영. 한국어 자유응답 자동 채점 baseline 대비 +6.16%p 정확도 향상 (komoran 트랙) + 추가 +1.08%p (khaiii 트랙). 28 도메인 약 1.075M 문장 대규모 평가 진행중.

Morphology, syntax, discourse, markers, dictionary, and structure analysis — owned in-house, single-system operation. +6.16%p Korean essay-scoring accuracy lift (komoran track) plus +1.08%p (khaiii track). Large-scale evaluation across ~28 domains / ~1.075M sentences in progress.

Mini-dashboard · indicators only

Verified signals. 추세·비교·상태만 — 절대 수치 일부는 공개 보류.

+6.16%p

Accuracy lift

GT 500 · 5-run identical

~15×

Throughput ratio

22 r/s → 330 r/s · concurrency 16

765 / 765

Bit-identical cases

100% pass · multi-set

Throughput · before vs after (req/s)

prior stack22unified330

200 req sweep · concurrency 16 · workers=4 unified container replaces multi-container ensemble

Worker scaling · req/s at concurrency 16

w150w2143w4330w8651

Single container scales linearly with worker count · w4 production default (9.6 GiB memory)

Bit-identical validation

100%

765 / 765 cases

224 + 229 + 312 across 3 sets

HTTP-response-level identity · 5 runs

Components in production

All in-house · zero external analyzer dependencies

Included NSD products

  • NSD-Tokenizer-v2.2
  • NSD-Korean-Syntax-Analyzer
  • NSD-Connector-Analyzer-v2
  • NSD-Discourse-Analyzer
  • NSD-Unified-Discourse-BSNC
  • NSD-Korean-IntDic

Verified internally · 2026-04 · 28 domain / 1.075M sentence large-scale evaluation in progress.