On-prem LLM serving
NSD-LLM-Serving
자체 운영 멀티모달·추론 LLM 서빙 인프라. 운영 안정성 실측.
On-prem multimodal and reasoning LLM serving with operationally verified stability.
외부 LLM 호출 없이 자체 인프라에서 멀티모달·추론 모델 운영. 9시간 운영 hard 5xx 0건 + 1,711건 transient retry 100% 회복. 직전 세대 대비 throughput +11.9%, p50 latency −11.0%, prefix cache hit rate +34 percentage points 개선.
Multimodal and reasoning models served entirely on in-house infrastructure. 9-hour soak: 0 hard 5xx + 1,711 transient retries fully recovered. +11.9% throughput, −11.0% p50 latency, and +34 percentage points prefix cache hit rate vs prior generation.
Mini-dashboard · indicators only
Verified signals. 추세·비교·상태만 — 절대 수치 일부는 공개 보류.
0 / 9h
Hard 5xx errors
1,711 transients · 100% client-retry recovery
4.98 r/s
Sustained throughput
single instance · N=150 · C=50
78.4%
Prefix cache hit
+34 pp vs prior generation
8.14 s
p50 latency
−11.0% vs prior generation
Throughput · before vs after (relative)
+11.9% sustained throughput vs prior generation · same hardware (RTX A6000 ×2)
p50 latency · before vs after (relative · lower is better)
−11.0% p50 latency · 8.14 s current
Prefix cache hit rate
78.4%
+34 pp vs prior
Warm load profile · sustained benchmark
Hard 5xx errors · 9-hour window
Flat at zero · 1,711 transient errors auto-recovered by client retry
Included NSD products
- NSD-LLM-Multimodal-Serve-v1
- NSD-LLM-Reasoning-14B-Serve-v1
- NSD-LLM-Reasoning-32B-Serve-v1
- NSD-AgenticRAG-V1
9-hour soak · N=150 sustained-load benchmark · concurrency 50.