16 18 15

Gabriel Mongaras PRO

gmongaras

LiuK1993's profile picture

luosuu's profile picture

timlee0131's profile picture

https://gmongaras.me/

gmongaras
gmongaras
gmongaras

AI & ML interests

None yet

Recent Activity

published a model about 5 hours ago

gmongaras/medium_8192sl_gpu_64bs__mamba

published a model about 5 hours ago

gmongaras/medium_8192sl_gpu_64bs__softmax

published a model about 5 hours ago

gmongaras/medium_8192sl_gpu_64bs__squared__sm_norm__A_mask_type_neg_softplus__in_conv_k_2__att2

View all activity

Organizations

gmongaras 's collections 9

2Mamba2Furious: Linear in Complexity, Competitive in Accurac

Pretrained models for the paper 2Mamba2Furious: Linear in Complexity, Competitive in Accuracy (https://arxiv.org/abs/2602.17363)

gmongaras/medium_8192sl_gpu_64bs__squared__sm_norm__A_mask_type_neg_softplus__in_conv_k_2__att2

3B • Updated 12 days ago
gmongaras/medium_8192sl_gpu_64bs__softmax

0.7B • Updated 12 days ago
gmongaras/medium_8192sl_gpu_64bs__mamba

0.7B • Updated 12 days ago

Stable Diffusion 3 Checkpoints

Collection of checkpoints from the stable diffusion 3 model I am training (https://github.com/gmongaras/Stable-Diffusion-3-From-Scratch)

gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_13batchsize_stage3

Updated May 14, 2025
gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_40batchsize_stage2

Updated Apr 28, 2025
gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_140batchsize_stage1

Updated Apr 19, 2025
gmongaras/datav3_attempt4_8GPU_SoftFlash_RoPE2dV2_2AccSteps_stage2

Updated Apr 11, 2025

Cosine Attention (Cottention)

Models for the paper Cottention: Linear Transformers With Cosine Attention https://arxiv.org/abs/2409.18747

gmongaras/Cosine_Attention_GPT_300M

Feature Extraction • Updated Oct 7, 2024
gmongaras/Softmax_Attention_GPT_1.2B

Feature Extraction • Updated Oct 7, 2024
gmongaras/Softmax_Attention_GPT_300M

Feature Extraction • Updated Oct 7, 2024
gmongaras/Cosine_Attention_GPT_1.2B

Feature Extraction • Updated Oct 7, 2024

Squad Models

Models trained on squad data

gmongaras/Wizard_7B_Squad

Text Generation • Updated Sep 11, 2023 • 8
gmongaras/Wizard_7B_Squad_8bit

Text Generation • Updated Sep 11, 2023 • 1
gmongaras/Wizard_7B_Squad_v2

Text Generation • Updated Sep 15, 2023 • 4

Subtitle Data

gmongaras/Anime_Subtitle_data

Viewer • Updated Mar 31, 2024 • 14.6M • 18 • 1
gmongaras/Anime_Subtitle_data2

Viewer • Updated Mar 31, 2024 • 1.91M • 24

Stuff I'm going to read

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 154
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 52
Motion Attribution for Video Generation

Paper • 2601.08828 • Published Jan 13 • 71
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published 24 days ago • 23

datasets

gmongaras/CC12M_and_Imagenet21K_Recap

Viewer • Updated Sep 17, 2025 • 22.7M • 878 • 7
gmongaras/Imagenet21K_Recaption

Viewer • Updated Sep 17, 2025 • 13.1M • 5.35k • 9
gmongaras/SlimPajama-627B_Reupload

Viewer • Updated Apr 6, 2025 • 591M • 6k • 5
gmongaras/EleutherAI_the_pile_deduplicated

Viewer • Updated Dec 31, 2023 • 134M • 41 • 3

Reddit Models

Some terrible Reddit models I am training just to see what happens. Never again will I hear "As an AI language model"

gmongaras/Wizard_7B_Reddit_Political_2019_13B

Text Generation • Updated Sep 15, 2023 • 6
gmongaras/Wizard_7B_Reddit_Political_2019

Text Generation • Updated Sep 11, 2023 • 6
gmongaras/Wizard_7B_Reddit_Political_2019_8bit

Text Generation • 7B • Updated Sep 11, 2023 • 2
gmongaras/reddit_negative_v1_8B

Text Generation • Updated Sep 15, 2023 • 9

BERT_512

gmongaras/BERT_Base_Cased_512_Dataset

Viewer • Updated Nov 28, 2023 • 136M • 111
gmongaras/BERT_Base_Cased_512_Dataset_Mapped

Viewer • Updated Nov 29, 2023 • 136M • 11
gmongaras/BERT_Base_Cased_512_GLUE

Viewer • Updated Dec 11, 2023 • 1.44M • 35
gmongaras/BERT_Base_Cased_512_GLUE_Mapped

Viewer • Updated Dec 11, 2023 • 1.44M • 11

2Mamba2Furious: Linear in Complexity, Competitive in Accurac

Pretrained models for the paper 2Mamba2Furious: Linear in Complexity, Competitive in Accuracy (https://arxiv.org/abs/2602.17363)

gmongaras/medium_8192sl_gpu_64bs__squared__sm_norm__A_mask_type_neg_softplus__in_conv_k_2__att2

3B • Updated 12 days ago
gmongaras/medium_8192sl_gpu_64bs__softmax

0.7B • Updated 12 days ago
gmongaras/medium_8192sl_gpu_64bs__mamba

0.7B • Updated 12 days ago

Stuff I'm going to read

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 154
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 52
Motion Attribution for Video Generation

Paper • 2601.08828 • Published Jan 13 • 71
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published 24 days ago • 23

Stable Diffusion 3 Checkpoints

Collection of checkpoints from the stable diffusion 3 model I am training (https://github.com/gmongaras/Stable-Diffusion-3-From-Scratch)

gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_13batchsize_stage3

Updated May 14, 2025
gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_40batchsize_stage2

Updated Apr 28, 2025
gmongaras/datav3_attempt5_8GPU_SoftFlash_RoPE2d_2AccSteps_140batchsize_stage1

Updated Apr 19, 2025
gmongaras/datav3_attempt4_8GPU_SoftFlash_RoPE2dV2_2AccSteps_stage2

Updated Apr 11, 2025

datasets

gmongaras/CC12M_and_Imagenet21K_Recap

Viewer • Updated Sep 17, 2025 • 22.7M • 878 • 7
gmongaras/Imagenet21K_Recaption

Viewer • Updated Sep 17, 2025 • 13.1M • 5.35k • 9
gmongaras/SlimPajama-627B_Reupload

Viewer • Updated Apr 6, 2025 • 591M • 6k • 5
gmongaras/EleutherAI_the_pile_deduplicated

Viewer • Updated Dec 31, 2023 • 134M • 41 • 3

Cosine Attention (Cottention)

Models for the paper Cottention: Linear Transformers With Cosine Attention https://arxiv.org/abs/2409.18747

gmongaras/Cosine_Attention_GPT_300M

Feature Extraction • Updated Oct 7, 2024
gmongaras/Softmax_Attention_GPT_1.2B

Feature Extraction • Updated Oct 7, 2024
gmongaras/Softmax_Attention_GPT_300M

Feature Extraction • Updated Oct 7, 2024
gmongaras/Cosine_Attention_GPT_1.2B

Feature Extraction • Updated Oct 7, 2024

Reddit Models

Some terrible Reddit models I am training just to see what happens. Never again will I hear "As an AI language model"

gmongaras/Wizard_7B_Reddit_Political_2019_13B

Text Generation • Updated Sep 15, 2023 • 6
gmongaras/Wizard_7B_Reddit_Political_2019

Text Generation • Updated Sep 11, 2023 • 6
gmongaras/Wizard_7B_Reddit_Political_2019_8bit

Text Generation • 7B • Updated Sep 11, 2023 • 2
gmongaras/reddit_negative_v1_8B

Text Generation • Updated Sep 15, 2023 • 9

Squad Models

Models trained on squad data

gmongaras/Wizard_7B_Squad

Text Generation • Updated Sep 11, 2023 • 8
gmongaras/Wizard_7B_Squad_8bit

Text Generation • Updated Sep 11, 2023 • 1
gmongaras/Wizard_7B_Squad_v2

Text Generation • Updated Sep 15, 2023 • 4

BERT_512

gmongaras/BERT_Base_Cased_512_Dataset

Viewer • Updated Nov 28, 2023 • 136M • 111
gmongaras/BERT_Base_Cased_512_Dataset_Mapped

Viewer • Updated Nov 29, 2023 • 136M • 11
gmongaras/BERT_Base_Cased_512_GLUE

Viewer • Updated Dec 11, 2023 • 1.44M • 35
gmongaras/BERT_Base_Cased_512_GLUE_Mapped

Viewer • Updated Dec 11, 2023 • 1.44M • 11

Subtitle Data

gmongaras/Anime_Subtitle_data

Viewer • Updated Mar 31, 2024 • 14.6M • 18 • 1
gmongaras/Anime_Subtitle_data2

Viewer • Updated Mar 31, 2024 • 1.91M • 24