| --- |
| language: |
| - en |
| pipeline_tag: text-classification |
| library_name: peft |
| base_model: microsoft/deberta-v3-large |
| datasets: |
| - stealthcode/ai-detection |
| tags: |
| - lora |
| - ai-detection |
| - binary-classification |
| - deberta-v3-large |
| metrics: |
| - accuracy |
| - f1 |
| - auroc |
| - average_precision |
| model-index: |
| - name: AI Detector LoRA (DeBERTa-v3-large) |
| results: |
| - task: |
| type: text-classification |
| name: AI Text Detection |
| dataset: |
| name: stealthcode/ai-detection |
| type: stealthcode/ai-detection |
| metrics: |
| - type: auroc |
| value: 0.9985 |
| - type: f1 |
| value: 0.9812 |
| - type: accuracy |
| value: 0.9814 |
| --- |
| |
| # AI Detector LoRA (DeBERTa-v3-large) |
|
|
| LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.7M English samples |
| (`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model. |
|
|
| - **Base model:** `microsoft/deberta-v3-large` |
| - **Task:** Binary classification (AI vs Human) |
| - **Head:** Single-logit + `BCEWithLogitsLoss` |
| - **Adapter type:** LoRA (`peft`) |
| - **Hardware:** 8 x RTX 5090, bf16, multi-GPU |
| - **Final decision threshold:** **0.8697** (max-F1 on calibration set) |
|
|
| --- |
|
|
| ## Files in this repo |
|
|
| - `adapter/` – LoRA weights saved with `peft_model.save_pretrained(...)` |
| - `merged_model/` – fully merged model (base + LoRA) for standalone use |
| - `threshold.json` – chosen deployment threshold and validation F1 |
| - `calibration.json` – temperature scaling parameters and calibration metrics |
| - `results.json` – hyperparameters, validation threshold search, test metrics |
| - `training_log_history.csv` – raw Trainer log history |
| - `predictions_calib.csv` – calibration-set probabilities and labels |
| - `predictions_test.csv` – test probabilities and labels |
| - `figures/` – training and evaluation plots |
| - `README.md` – this file |
|
|
| --- |
|
|
| ## Metrics (test set, n=279,241) |
|
|
| Using threshold **0.8697**: |
|
|
| | Metric | Value | |
| | ---------------------- | ------ | |
| | AUROC | 0.9985 | |
| | Average Precision (AP) | 0.9985 | |
| | F1 | 0.9812 | |
| | Accuracy | 0.9814 | |
| | Precision (AI) | 0.9902 | |
| | Recall (AI) | 0.9724 | |
| | Precision (Human) | 0.9728 | |
| | Recall (Human) | 0.9904 | |
|
|
| Confusion matrix (test): |
|
|
| - **True Negatives (Human correctly)**: 138,276 |
| - **False Positives (Human → AI)**: 1,345 |
| - **False Negatives (AI → Human)**: 3,859 |
| - **True Positives (AI correctly)**: 135,761 |
|
|
| ### Calibration |
|
|
| - **Method:** temperature scaling |
| - **Temperature (T):** 1.4437 |
| - **Calibration set:** calibration |
| - Test ECE: 0.0075 → 0.0116 (after calibration) |
| - Test Brier: 0.0157 → 0.0156 (after calibration) |
|
|
| --- |
|
|
| ## Plots |
|
|
| ### Training & validation |
|
|
| - Learning curves: |
|
|
|  |
|
|
| - Eval metrics over time: |
|
|
|  |
|
|
| ### Validation set |
|
|
| - ROC: |
|
|
|  |
|
|
| - Precision–Recall: |
|
|
|  |
|
|
| - Calibration curve: |
|
|
|  |
|
|
| - F1 vs threshold: |
|
|
|  |
|
|
| ### Test set |
|
|
| - ROC: |
|
|
|  |
|
|
| - Precision–Recall: |
|
|
|  |
|
|
| - Calibration curve: |
|
|
|  |
|
|
| - Confusion matrix: |
|
|
|  |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### Load base + LoRA adapter |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| from peft import PeftModel |
| import torch |
| import json |
| |
| base_model_id = "microsoft/deberta-v3-large" |
| adapter_id = "stealthcode/ai-detection" # or local: "./adapter" |
| |
| tokenizer = AutoTokenizer.from_pretrained(base_model_id) |
| |
| base_model = AutoModelForSequenceClassification.from_pretrained( |
| base_model_id, |
| num_labels=1, # single logit for BCEWithLogitsLoss |
| ) |
| model = PeftModel.from_pretrained(base_model, adapter_id) |
| model.eval() |
| ``` |
|
|
| ### Inference with threshold |
|
|
| ```python |
| # load threshold |
| with open("threshold.json") as f: |
| thr = json.load(f)["threshold"] # 0.8697 |
| |
| def predict_proba(texts): |
| enc = tokenizer( |
| texts, |
| padding=True, |
| truncation=True, |
| max_length=512, |
| return_tensors="pt", |
| ) |
| with torch.no_grad(): |
| logits = model(**enc).logits.squeeze(-1) |
| probs = torch.sigmoid(logits) |
| return probs.cpu().numpy() |
| |
| def predict_label(texts, threshold=thr): |
| probs = predict_proba(texts) |
| return (probs >= threshold).astype(int) |
| |
| # example |
| texts = ["Some example text to classify"] |
| probs = predict_proba(texts) |
| labels = predict_label(texts) |
| print(probs, labels) # label 1 = AI, 0 = Human |
| ``` |
|
|
| ### Load merged model (no PEFT required) |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch, json |
| |
| model_dir = "./merged_model" |
| tokenizer = AutoTokenizer.from_pretrained(model_dir) |
| model = AutoModelForSequenceClassification.from_pretrained(model_dir) |
| model.eval() |
| |
| with open("threshold.json") as f: |
| thr = json.load(f)["threshold"] # 0.8697 |
| |
| def predict_proba(texts): |
| enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt") |
| with torch.no_grad(): |
| logits = model(**enc).logits.squeeze(-1) |
| probs = torch.sigmoid(logits) |
| return probs.cpu().numpy() |
| ``` |
|
|
| ### Optional: apply temperature scaling to logits |
|
|
| ```python |
| import json |
| with open("calibration.json") as f: |
| T = json.load(f)["temperature"] # e.g., 1.4437 |
| |
| def predict_proba_calibrated(texts): |
| enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt") |
| with torch.no_grad(): |
| logits = model(**enc).logits.squeeze(-1) |
| probs = torch.sigmoid(logits / T) |
| return probs.cpu().numpy() |
| ``` |
|
|
| --- |
|
|
| ## Notes |
|
|
| - Classifier head is **trainable** together with LoRA layers (unfrozen after applying PEFT). |
| - **LoRA config:** |
| - `r=32`, `alpha=128`, `dropout=0.0` |
| - Target modules: `query_proj`, `key_proj`, `value_proj` |
| - **Training config:** |
|
|
| - `bf16=True` |
| - `optim="adamw_torch_fused"` |
| - `lr_scheduler_type="cosine_with_restarts"` |
| - `num_train_epochs=2` |
| - `per_device_train_batch_size=8`, `gradient_accumulation_steps=4` |
| - `max_grad_norm=0.5` |
|
|
| - Threshold `0.8697` was chosen as the **max-F1** point on the calibration set. |
| You can adjust it if you prefer fewer false positives or fewer false negatives. |
|
|