raga - mahoraga from anime (His ability is to adapt to nature itself)
A tiny spicy ModernBERT classifier for text-risk signals - Made by @PotatoOff
Potato did not write a README, so this appeared by magic!
What does it classify?
Probably text / account-behavior risk labels, inferred from the eval table:
transactional_spamβ spammy transactional or promo-style contentextractive_presenceβ likely copy/extraction/presence-pattern signalengagement_automationβ botty engagement / automated interaction signalaccount_farmingβ account-growth or farming behavior signal
Exact label semantics depend on the training data.
Model
- Base:
answerdotai/ModernBERT-base - Type: ModernBERT sequence classifier
- Context: up to 8,192 tokens
- Best for: classification, moderation-ish filters, long text scoring
Eval snapshot
| Label | F1 | Precision | Recall | Notes |
|---|---|---|---|---|
transactional_spam |
0.94 | 0.89 | 0.99 | π’ Excellent |
extractive_presence |
0.84 | 0.73 | 0.99 | π’ Great recall |
engagement_automation |
0.65 | 0.53 | 0.85 | π‘ Precision weak |
account_farming |
0.62 | 0.61 | 0.63 | π‘ Hardest label |
Install
pip install -U "transformers>=4.48.0" torch
Optional GPU speedup:
pip install flash-attn
Inference
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_id = "WeReCooking/raga"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(
model_id,
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else None,
device_map="auto" if torch.cuda.is_available() else None,
# attn_implementation="flash_attention_2", # optional, if installed
)
text = "paste text to classify here"
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
max_length=getattr(model.config, "max_position_embeddings", 8192),
)
# ModernBERT does not need token_type_ids
inputs.pop("token_type_ids", None)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
logits = model(**inputs).logits[0].float()
id2label = {int(k): v for k, v in model.config.id2label.items()}
multi = getattr(model.config, "problem_type", None) == "multi_label_classification"
scores = torch.sigmoid(logits) if multi else torch.softmax(logits, dim=-1)
for i, score in sorted(enumerate(scores.tolist()), key=lambda x: x[1], reverse=True):
print(f"{id2label.get(i, str(i))}: {score:.4f}")
Notes
Use threshold 0.50 for multi-label as a starting point, then tune per label.
transactional_spam looks strong.
engagement_automation and account_farming probably need calibration before serious use.
- Downloads last month
- 23
Model tree for WeReCooking/ModernBERT-risk-classifier
Base model
answerdotai/ModernBERT-base