Pretrained models for the paper 2Mamba2Furious: Linear in Complexity, Competitive in Accuracy (https://arxiv.org/abs/2602.17363)
Gabriel Mongaras PRO
gmongaras
AI & ML interests
None yet
Recent Activity
published
a model
about 5 hours ago
gmongaras/medium_8192sl_gpu_64bs__mamba
published
a model
about 5 hours ago
gmongaras/medium_8192sl_gpu_64bs__softmax
published
a model
about 5 hours ago
gmongaras/medium_8192sl_gpu_64bs__squared__sm_norm__A_mask_type_neg_softplus__in_conv_k_2__att2
Organizations
Stable Diffusion 3 Checkpoints
Collection of checkpoints from the stable diffusion 3 model I am training (https://github.com/gmongaras/Stable-Diffusion-3-From-Scratch)
-
gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_13batchsize_stage3
Updated -
gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_40batchsize_stage2
Updated -
gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_140batchsize_stage1
Updated -
gmongaras/datav3_attempt4_8GPU_SoftFlash_RoPE2dV2_2AccSteps_stage2
Updated
Cosine Attention (Cottention)
Models for the paper Cottention: Linear Transformers With Cosine Attention https://arxiv.org/abs/2409.18747
Squad Models
Models trained on squad data
Subtitle Data
Stuff I'm going to read
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 154 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 71 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 23
datasets
Reddit Models
Some terrible Reddit models I am training just to see what happens. Never again will I hear "As an AI language model"
BERT_512
2Mamba2Furious: Linear in Complexity, Competitive in Accurac
Pretrained models for the paper 2Mamba2Furious: Linear in Complexity, Competitive in Accuracy (https://arxiv.org/abs/2602.17363)
Stuff I'm going to read
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 154 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 71 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 23
Stable Diffusion 3 Checkpoints
Collection of checkpoints from the stable diffusion 3 model I am training (https://github.com/gmongaras/Stable-Diffusion-3-From-Scratch)
-
gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_13batchsize_stage3
Updated -
gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_40batchsize_stage2
Updated -
gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_140batchsize_stage1
Updated -
gmongaras/datav3_attempt4_8GPU_SoftFlash_RoPE2dV2_2AccSteps_stage2
Updated
datasets
Cosine Attention (Cottention)
Models for the paper Cottention: Linear Transformers With Cosine Attention https://arxiv.org/abs/2409.18747
Reddit Models
Some terrible Reddit models I am training just to see what happens. Never again will I hear "As an AI language model"
Squad Models
Models trained on squad data
BERT_512
Subtitle Data