Joshua Butler
joshb556
·
AI & ML interests
None yet
Organizations
None yet
To read
-
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Paper • 2507.04404 • Published • 22 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper • 2504.11651 • Published • 31 -
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
Paper • 2505.12781 • Published • 2 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 261
Qwen
To read
-
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Paper • 2507.04404 • Published • 22 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper • 2504.11651 • Published • 31 -
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
Paper • 2505.12781 • Published • 2 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 261
models 0
None public yet
datasets 0
None public yet