TRACER is an open-source routing system that learns from an LLM’s own production traces to replace a large share of future classification calls with lightweight ML surrogates. It deploys only when a parity gate shows the surrogate matches the teacher above a user-defined quality threshold, and it generates artifacts that make the routing boundary inspectable. In experiments, TRACER achieved 83–100% surrogate coverage on a 77-class intent benchmark, fully replaced the teacher on a 150-class benchmark, and correctly refused deployment on NLI when the representation was not reliable enough.
Many LLM classification calls in production are overkill. For tasks like intent detection, moderation, tagging, or routing, TRACER learns which requests can be safely offloaded to a lightweight ML model trained on the LLM’s own outputs.
You keep the hard cases on the LLM, set a target quality bar, and offload the easy traffic.
On the right workloads, this can remove 90%+ of LLM calls.