The Y-Combinator for LLMs: Solving Long-Context Rot with λ-Calculus
Abstract
λ-RLM replaces unbounded recursive code generation with typed functional runtime based on λ-calculus, providing formal guarantees and improved efficiency for long-context reasoning tasks.
LLMs are increasingly used as general-purpose reasoners, but long inputs remain bottlenecked by a fixed context window. Recursive Language Models (RLMs) address this by externalising the prompt and recursively solving subproblems. Yet existing RLMs depend on an open-ended read-eval-print loop (REPL) in which the model generates arbitrary control code, making execution difficult to verify, predict, and analyse. We introduce λ-RLM, a framework for long-context reasoning that replaces free-form recursive code generation with a typed functional runtime grounded in λ-calculus. It executes a compact library of pre-verified combinators and uses neural inference only on bounded leaf subproblems, turning recursive reasoning into a structured functional program with explicit control flow. We show that λ-RLM admits formal guarantees absent from standard RLMs, including termination, closed-form cost bounds, controlled accuracy scaling with recursion depth, and an optimal partition rule under a simple cost model. Empirically, across four long-context reasoning tasks and nine base models, λ-RLM outperforms standard RLM in 29 of 36 model-task comparisons, improves average accuracy by up to +21.9 points across model tiers, and reduces latency by up to 4.1x. These results show that typed symbolic control yields a more reliable and efficient foundation for long-context reasoning than open-ended recursive code generation. The complete implementation of λ-RLM, is open-sourced for the community at: https://github.com/lambda-calculus-LLM/lambda-RLM.
Community
🎰 Why your 405B model is losing to an 8B model (and how 1930s math fixed it).
The AI industry has a "Context Rot" problem. 🥕
As prompts get longer, we usually try to fix them with more RAM or massive parameter counts. But "stochastic control", i.e., letting an LLM write its own arbitrary code to manage its memory, is inherently unreliable. It leads to non-termination, malformed outputs, and unpredictable costs.
We need better logic 🧠
We introduced $\lambda$-RLM, a framework that replaces messy, open-ended recursive code generation with a typed functional runtime grounded in $\lambda$-Calculus.
The "David vs. Goliath" Results:
The Flex: An 8B model using $\lambda$-RLM actually beats the accuracy of a 405B model on long-context tasks. 🥳
The Match: Our scaffolded 8B model matches the performance of a 70B model while being 3.1x faster. 🎅
The Speed: Across the board, we saw latency reductions of up to 4.1x. 🎀
The Gains: Average accuracy improved by up to +21.9 points on "weak" model tiers.
How it Works (The Math):
1️⃣ Instead of a "hallucination-prone" REPL loop, we use a fixed library of pre-verified combinators like SPLIT, MAP, and REDUCE.
2️⃣ We used the Y-combinator to "tie the knot" of recursion symbolically. This ensures:
3️⃣ Guaranteed Termination: No more infinite loops.
4️⃣ Predictable Cost: We proved the optimal partition for AI reasoning is exactly k^*= check the paper 😜
The future of reliable AI isn't just "bigger parameters." It’s providing models with high-integrity, verifiable environments.
Is there a reason for using lamba calculus for this, it seems to me that this is massively over-engineered..? Is this not equivalent to a few lines of python? Or am I missing something. Sure the lamba stuff is fancy, but surely the point is to move the control flow outside of the model. Regardless of code methodology. So it's a trade off between strict determistic control flow vs llm generated "anything goes" control flow? If so, lamba calculus is irrelevant red herring that just makes this more complex than it needs to be?
First thing first - Its not at all complex - idk whats your definition of complexity. Lambda calculus is used here to make the flow interpretable- We, as a researcher always care to know how a blackbox works- lambda calculus is a way to make things clear.
The practical runtime is indeed a small Python program - split, map, reduce, call the LLM on leaves and we make no claim otherwise. But “a few lines of Python” is a description, not an analysis. You cannot prove that a Python script terminates on all inputs, that its cost satisfies a closed-form recurrence, or that its accuracy degrades at a provable rate, by looking at the Python. Lambda calculus is not the implementation it is the proof language. ( please read formal verification theory) Termination follows by structural induction on the rank of the fixed-point term, which Python’s semantics do not support. The cost bound T(n) = k·T(n/k) + C(k) falls out of the recursive structure of the combinator because the lambda term makes that structure explicit and uniform. The accuracy bound factorizes cleanly into leaf and composition terms because each combinator has a typed signature that permits compositional reasoning. The closed-form optimal k* comes from differentiating the cost recurrence, which we can write down only because the term makes the recursion formally transparent. The relationship is identical to that between quicksort and its complexity analysis: quicksort is also “just a few lines of Python,” but nobody analyzes its O(n log n) bound by staring at Python, they use recurrence relations derived from the algorithm’s recursive structure expressed formally. Lambda calculus is to λ-RLM what Big-O notation is to algorithms: you do not need it to run the code, you need it to know what the code will do before you run it. Without it, the claim “this system terminates in bounded cost with predictable accuracy” is a hope; with it, it is a theorem.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Recursive Models for Long-Horizon Reasoning (2026)
- Turn: A Language for Agentic Computation (2026)
- ConvexBench: Can LLMs Recognize Convex Functions? (2026)
- Draft-Conditioned Constrained Decoding for Structured Generation in LLMs (2026)
- Teaching LLMs to Ask: Self-Querying Category-Theoretic Planning for Under-Specified Reasoning (2026)
- NL2LOGIC: AST-Guided Translation of Natural Language into First-Order Logic with Large Language Models (2026)
- Draft-and-Prune: Improving the Reliability of Auto-formalization for Logical Reasoning (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
