raga - mahoraga from anime (His ability is to adapt to nature itself)

A tiny spicy ModernBERT classifier for text-risk signals - Made by @PotatoOff

Potato did not write a README, so this appeared by magic!

What does it classify?

Probably text / account-behavior risk labels, inferred from the eval table:

transactional_spam — spammy transactional or promo-style content
extractive_presence — likely copy/extraction/presence-pattern signal
engagement_automation — botty engagement / automated interaction signal
account_farming — account-growth or farming behavior signal

Exact label semantics depend on the training data.

Model

Base: answerdotai/ModernBERT-base
Type: ModernBERT sequence classifier
Context: up to 8,192 tokens
Best for: classification, moderation-ish filters, long text scoring

Eval snapshot

Label	F1	Precision	Recall	Notes
`transactional_spam`	0.94	0.89	0.99	🟢 Excellent
`extractive_presence`	0.84	0.73	0.99	🟢 Great recall
`engagement_automation`	0.65	0.53	0.85	🟡 Precision weak
`account_farming`	0.62	0.61	0.63	🟡 Hardest label

Install

pip install -U "transformers>=4.48.0" torch

Optional GPU speedup:

pip install flash-attn

Inference

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "WeReCooking/raga"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForSequenceClassification.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else None,
    device_map="auto" if torch.cuda.is_available() else None,
    # attn_implementation="flash_attention_2",  # optional, if installed
)

text = "paste text to classify here"

inputs = tokenizer(
    text,
    return_tensors="pt",
    truncation=True,
    max_length=getattr(model.config, "max_position_embeddings", 8192),
)

# ModernBERT does not need token_type_ids
inputs.pop("token_type_ids", None)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    logits = model(**inputs).logits[0].float()

id2label = {int(k): v for k, v in model.config.id2label.items()}
multi = getattr(model.config, "problem_type", None) == "multi_label_classification"

scores = torch.sigmoid(logits) if multi else torch.softmax(logits, dim=-1)

for i, score in sorted(enumerate(scores.tolist()), key=lambda x: x[1], reverse=True):
    print(f"{id2label.get(i, str(i))}: {score:.4f}")

Notes

Use threshold 0.50 for multi-label as a starting point, then tune per label. transactional_spam looks strong. engagement_automation and account_farming probably need calibration before serious use.

Downloads last month: 23

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for WeReCooking/ModernBERT-risk-classifier

Base model

answerdotai/ModernBERT-base

Finetuned

(1239)

this model