Convergent Intelligence PRO

reaperdoesntknow

https://www.convergentintel.com

AI & ML interests

About Us Mission Convergent Intelligence advances original research in discrepancy calculus, adaptive systems, and applied AI, translating those insights into client controls, playbooks, and leadership-ready briefs.

Recent Activity

updated a model about 5 hours ago

reaperdoesntknow/TopologicalQwen

published a model about 5 hours ago

reaperdoesntknow/TopologicalQwen

reacted to theirpost with 👍 about 7 hours ago

We present a methodology for training small language models on CPU at FP32 precision that achieves capability-per-dollar efficiency orders of magnitude beyond GPU-based training. Across15modelsspanningfournovelarchitecturefamilies—MixtureofAttentions(MoA),cross- architecture fusion (Qemma), swarm intelligence (SAGI), and metric-space causal language models (DiscoverLM)—total compute cost was $24 on a single AMD EPYC 9454P proces- sor. We introduce seven methodological pillars: (1) FP32 precision preservation, with exper- iments demonstrating 5,810×single-operation error and 23,225×compounding error ratio for FP16 at network depth; (2) sparse cognitive architectures where 0.02–7% of parameters activate per token, matching CPU branching rather than GPU SIMD; (3) developmental curriculum training progressing from language to logic to transfer to depth; (4) continuous belt-fed data ingestion eliminating truncation waste; (5) hardware-native optimization for AMD Zen 4 via AOCL/OpenMP/NUMA-aware allocation; (6) self-regulating thermodynamic governance with emergent temperature measurement grounded in L2-star discrepancy; and (7) open-standard compute (AVX2 SIMD at FP32) free of proprietary vendor dependency. We argue that trans- formers were designed for GPU hardware rather than mathematical optimality, and that archi- tectures designed for geometric correctness—metric-space attention, triangle inequality enforce- ment, sparse expert routing—naturally favor CPU execution. For sub-2B parameter models, CPU training produces more capable models at a fraction of the cost.

View all activity

Organizations

updated a model about 5 hours ago

reaperdoesntknow/TopologicalQwen

Text Generation • 2B • Updated about 5 hours ago

published a model about 5 hours ago

reaperdoesntknow/TopologicalQwen

Text Generation • 2B • Updated about 5 hours ago

reactedto their post with 👍 about 7 hours ago

Post

We present a methodology for training small language models on CPU at FP32 precision
that achieves capability-per-dollar efficiency orders of magnitude beyond GPU-based training.
Across15modelsspanningfournovelarchitecturefamilies—MixtureofAttentions(MoA),cross-
architecture fusion (Qemma), swarm intelligence (SAGI), and metric-space causal language
models (DiscoverLM)—total compute cost was $24 on a single AMD EPYC 9454P proces-
sor. We introduce seven methodological pillars: (1) FP32 precision preservation, with exper-
iments demonstrating 5,810×single-operation error and 23,225×compounding error ratio for
FP16 at network depth; (2) sparse cognitive architectures where 0.02–7% of parameters activate
per token, matching CPU branching rather than GPU SIMD; (3) developmental curriculum
training progressing from language to logic to transfer to depth; (4) continuous belt-fed data
ingestion eliminating truncation waste; (5) hardware-native optimization for AMD Zen 4 via
AOCL/OpenMP/NUMA-aware allocation; (6) self-regulating thermodynamic governance with
emergent temperature measurement grounded in L2-star discrepancy; and (7) open-standard
compute (AVX2 SIMD at FP32) free of proprietary vendor dependency. We argue that trans-
formers were designed for GPU hardware rather than mathematical optimality, and that archi-
tectures designed for geometric correctness—metric-space attention, triangle inequality enforce-
ment, sparse expert routing—naturally favor CPU execution. For sub-2B parameter models,
CPU training produces more capable models at a fraction of the cost.

posted an update about 12 hours ago

Post

We present a methodology for training small language models on CPU at FP32 precision
that achieves capability-per-dollar efficiency orders of magnitude beyond GPU-based training.
Across15modelsspanningfournovelarchitecturefamilies—MixtureofAttentions(MoA),cross-
architecture fusion (Qemma), swarm intelligence (SAGI), and metric-space causal language
models (DiscoverLM)—total compute cost was $24 on a single AMD EPYC 9454P proces-
sor. We introduce seven methodological pillars: (1) FP32 precision preservation, with exper-
iments demonstrating 5,810×single-operation error and 23,225×compounding error ratio for
FP16 at network depth; (2) sparse cognitive architectures where 0.02–7% of parameters activate
per token, matching CPU branching rather than GPU SIMD; (3) developmental curriculum
training progressing from language to logic to transfer to depth; (4) continuous belt-fed data
ingestion eliminating truncation waste; (5) hardware-native optimization for AMD Zen 4 via
AOCL/OpenMP/NUMA-aware allocation; (6) self-regulating thermodynamic governance with
emergent temperature measurement grounded in L2-star discrepancy; and (7) open-standard
compute (AVX2 SIMD at FP32) free of proprietary vendor dependency. We argue that trans-
formers were designed for GPU hardware rather than mathematical optimality, and that archi-
tectures designed for geometric correctness—metric-space attention, triangle inequality enforce-
ment, sparse expert routing—naturally favor CPU execution. For sub-2B parameter models,
CPU training produces more capable models at a fraction of the cost.