| | --- |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| |
|
| | { |
| | "library_name": "peft", |
| | "pipeline_tag": "text-classification", |
| | "license": "llama3.2", |
| | "tags": [ |
| | "LoRA", |
| | "QLoRA", |
| | "multi-label", |
| | "text-classification", |
| | "decoder-only", |
| | "peft", |
| | "transformers", |
| | "trl", |
| | "bitsandbytes", |
| | "base_model:adapter:meta-llama/Llama-3.2-1B" |
| | ], |
| | "base_model": "meta-llama/Llama-3.2-1B", |
| | "datasets": ["pietrolesci/amazoncat-13k"], |
| | } |
| | --- |
| | |
| | # Model Card for Amirhossein75/LLM-Decoder-Tuning-Text-Classification |
| |
|
| | > **One‑line summary:** Decoder‑only LLMs (e.g., Llama‑3.2‑1B) fine‑tuned for **multi‑label text classification** using **LoRA** adapters, with optional **4‑bit QLoRA** quantization for memory‑efficient training and inference. A clean CLI and YAML config make it easy to reproduce results and swap backbones. |
| |
|
| | This model card accompanies the repository **LLM‑Decoder‑Tuning‑Text‑Classification** and documents a practical recipe for using decoder‑only LLMs as strong multi‑label classifiers with parameter‑efficient fine‑tuning (PEFT). |
| |
|
| | > **Note:** This card describes a *training pipeline + example checkpoints*. If you push a specific checkpoint to the Hub, please fill in exact dataset splits, metrics, and license at upload time. |
| |
|
| | --- |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | This project provides a **modular training & inference stack** for multi‑label text classification built on top of **Hugging Face Transformers** and **PEFT**. It adapts **decoder‑only** LLMs (tested with `meta-llama/Llama-3.2-1B`) using **LoRA** adapters, and optionally enables **4‑bit quantization** (QLoRA‑style) for reduced memory footprint during training and inference. The repository exposes a **single CLI** for train/eval/predict and a **YAML configuration** to control data paths, model choice, and hyperparameters. |
| |
|
| | - **Developed by:** Amirhossein Yousefi (GitHub: `amirhossein-yousefi`; Hugging Face: `Amirhossein75`) |
| | - **Model type:** Decoder‑only causal LM with PEFT (LoRA) for multi‑label classification |
| | - **Language(s):** English (evaluated on AmazonCat‑13K subset) |
| | - **License:** The **base model** (`meta-llama/Llama-3.2-1B`) is under the **Llama 3.2 Community License**. The LoRA adapter you publish should declare its own license and acknowledge base‑model terms. |
| | - **Finetuned from:** `meta-llama/Llama-3.2-1B` (foundation) |
| |
|
| | ### Model Sources |
| |
|
| | - **Repository:** https://github.com/amirhossein-yousefi/LLM-Decoder-Tuning-Text-Classification |
| | - **Model (Hub placeholder):** https://huggingface.co/Amirhossein75/LLM-Decoder-Tuning-Text-Classification |
| | - **Background reading:** |
| | - LoRA: Low‑Rank Adaptation of Large Language Models (Hu et al., 2021) |
| | - QLoRA: Efficient Finetuning of Quantized LLMs (Dettmers et al., 2023) |
| | - PEFT documentation (Hugging Face) |
| |
|
| | --- |
| |
|
| | ## Uses |
| |
|
| | ### Direct Use |
| |
|
| | - **Multi‑label text classification** on English corpora (e.g., product tagging, topic tagging, content routing). |
| | - Inference via: |
| | - Provided **CLI** (`python -m llm_cls.cli predict --config ...`) producing JSONL predictions. |
| | - Hugging Face pipelines with base model + LoRA adapter loaded (see “How to Get Started”). |
| |
|
| | ### Downstream Use |
| |
|
| | - **Domain transfer:** Re‑train on your domain labels by pointing the YAML to your CSVs. |
| | - **Backbone swap:** Replace `model.model_name` in the config to try other decoders or encoders (set `use_4bit=false` for encoders). |
| |
|
| | ### Out‑of‑Scope Use |
| |
|
| | - Safety‑critical decisions without human oversight. |
| | - Tasks requiring **extreme multilabel** scaling (e.g., hundreds of thousands of labels) without additional adaptation. |
| | - Non‑English or code‑mixed data without validation. |
| | - Any use that conflicts with the base model’s license and acceptable‑use policies. |
| |
|
| | --- |
| |
|
| | ## Bias, Risks, and Limitations |
| |
|
| | - **Dataset bias:** AmazonCat‑13K originates from product data; labels and text reflect marketplace distributions and may encode demographic or topical biases. |
| | - **Multi‑label long tail:** Minority classes are harder; macro‑F1 often trails micro‑F1. Consider class weighting, augmentation, or threshold tuning. |
| | - **Decoder framing:** Treating classification as generation can be sensitive to prompt/format and decoding thresholds. |
| | - **License & usage constraints:** Ensure compliance with the Llama 3.2 Community License for derivatives and deployment. |
| |
|
| | ### Recommendations |
| |
|
| | - Track **micro‑ and macro‑F1** and per‑class metrics. |
| | - Use **threshold tuning** on validation to balance precision/recall per class. |
| | - For memory‑constrained environments, prefer **4‑bit + LoRA**; otherwise disable 4‑bit on platforms without `bitsandbytes` support. |
| |
|
| | --- |
| |
|
| | ## How to Get Started with the Model |
| |
|
| | Below is an example of loading a base Llama model with a LoRA adapter for classification‑style inference. Replace `BASE_MODEL` and `ADAPTER_REPO` with your IDs. |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForCausalLM, TextGenerationPipeline |
| | from peft import PeftModel |
| | import torch |
| | |
| | BASE_MODEL = "meta-llama/Llama-3.2-1B" |
| | ADAPTER_REPO = "Amirhossein75/LLM-Decoder-Tuning-Text-Classification" # or your own adapter |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, use_fast=True) |
| | base = AutoModelForCausalLM.from_pretrained( |
| | BASE_MODEL, |
| | torch_dtype=torch.bfloat16, |
| | device_map="auto", |
| | ) |
| | model = PeftModel.from_pretrained(base, ADAPTER_REPO) |
| | model.eval() |
| | |
| | # Simple prompt format for multi-label classification (adjust to your training format). |
| | labels = ["books","movies_tv","music","pop","literature_fiction","movies","education_reference","rock","used_rental_textbooks","new"] |
| | text = "A thrilling space opera with deep character arcs and rich world-building." |
| | |
| | prompt = ( |
| | "You are a classifier. Given the text, return a JSON list of applicable labels from this set: " |
| | + ", ".join(labels) + ".\n" |
| | + f"Text: {text}\nLabels: " |
| | ) |
| | |
| | pipe = TextGenerationPipeline(model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1) |
| | out = pipe(prompt, max_new_tokens=64, do_sample=False) |
| | print(out[0]["generated_text"]) |
| | ``` |
| |
|
| | For **CLI usage**: |
| |
|
| | ```bash |
| | # Train |
| | python -m llm_cls.cli train --config configs/default.yaml |
| | |
| | # Predict |
| | python -m llm_cls.cli predict --config configs/default.yaml --input_csv data/test.csv --output_jsonl preds.jsonl |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | - **Dataset:** AmazonCat‑13K (example subset; top‑10 categories for illustration). If you use the full dataset, update CSV paths and label columns accordingly. |
| | - **Format:** CSV with at least a text column and one or more label columns (multi‑label). Configure names in `configs/default.yaml`. |
| | - **Splits:** Train / Validation / (Optional) Test; sample scripts are provided to create CSV splits. |
| |
|
| | ### Training Procedure |
| |
|
| | #### Preprocessing |
| |
|
| | - Tokenization with the base model’s tokenizer. |
| | - Optional script to prepare AmazonCat‑13K CSVs (see `split_amazon_13k_data.py` in the repo). |
| |
|
| | #### Training Hyperparameters (illustrative config) |
| |
|
| | - **Base model:** `meta-llama/Llama-3.2-1B` |
| | - **Problem type:** `multi_label_classification` |
| | - **Precision / quantization:** `use_4bit: true` (QLoRA‑style); `torch_dtype: bfloat16` for computation |
| | - **LoRA:** `r=2`, `alpha=2`, `dropout=0.05` |
| | - **LoRA target modules:** `["q_proj","k_proj","v_proj","o_proj","gate_proj","down_proj","up_proj"]` |
| | - **Batch size:** `4` (with `gradient_accumulation_steps=8`) |
| | - **Max length:** `1024` |
| | - **Optimizer:** 8‑bit optimizer when quantized (`optim_8bit_when_4bit: true`) |
| | - **Epochs:** up to `20` with early stopping (`patience=2`) |
| | - **Metric for best model:** `f1_micro` |
| |
|
| | #### Speeds, Sizes, Times (example run) |
| |
|
| | - **Device:** NVIDIA GeForce RTX 3080 Ti Laptop GPU (16 GB VRAM) |
| | - **Runtime:** ~1,310 seconds for the best run |
| | - **Throughput:** ≈0.784 steps/s (≈24.9 samples/s) during training |
| | - **Artifacts:** Reproducible outputs under `outputs/<model_name>/<dataset_name>/run_<i>/` |
| |
|
| | --- |
| |
|
| | ## Evaluation |
| |
|
| | ### Testing Data, Factors & Metrics |
| |
|
| | - **Testing data:** Held‑out split from AmazonCat‑13K (example subset). |
| | - **Factors:** Evaluate both **micro‑F1** (overall) and **macro‑F1** (per‑class average) to reflect long‑tail performance. |
| | - **Metrics:** `f1_micro`, `f1_macro`, eval loss, throughput (steps/s, samples/s). |
| | ### Metrics |
| |
|
| | - **Best overall (micro-F1):** **0.830** at **5 epochs** |
| | - **Best minority‑class sensitivity (macro-F1):** **0.752** at **6 epochs** |
| | - **Average across 4 runs:** micro‑F1 **0.824**, macro‑F1 **0.741**, eval loss **0.161** |
| | - **Throughput:** train ≈ **0.784 steps/s** (**24.9 samples/s**) ; eval time ≈ **34.0s** per run. |
| |
|
| | > Interpretation: going from **4 → 5 epochs** gives the best **micro‑F1**; **6 epochs** squeezes out the top **macro‑F1**, hinting at slightly better coverage of minority classes with a tiny trade‑off in micro‑F1. |
| |
|
| | --- |
| | ### 📈 Per‑run metrics |
| | | Run | Epochs | Train Loss | Eval Loss | F1 (micro) | F1 (macro) | Train Time (s) | Train steps/s | Train samples/s | Eval Time (s) | |
| | |---:|---:|---:|---:|---:|---:|---:|---:|---:|---:| |
| | | 1 | 4 | 1.400 | 0.157 | 0.824 | 0.738 | 1309.6 | 0.962 | 30.543 | 33.6 | |
| | | 2 | 5 | 1.220 | 0.159 | 0.830 | 0.743 | 1640.3 | 0.768 | 24.385 | 34.0 | |
| | | 3 | 6 | 1.063 | 0.162 | 0.826 | 0.752 | 1984.2 | 0.635 | 20.159 | 34.4 | |
| | | 4 | 5 | 1.265 | 0.165 | 0.816 | 0.729 | 1639.3 | 0.769 | 24.401 | 34.0 | |
| |
|
| | <sub>*F1(micro)* aggregates decisions over all samples; *F1(macro)* averages per‑class F1 equally, highlighting minority‑class performance.</sub> |
| |
|
| | ### Results (example) |
| |
|
| | - **Best micro‑F1:** `0.830` at 5 epochs |
| | - **Best macro‑F1:** `0.752` at 6 epochs |
| | - **Average across 4 runs:** micro‑F1 `0.824`, macro‑F1 `0.741`, eval loss `0.161` |
| |
|
| | #### Summary |
| |
|
| | Decoder‑only LLMs with **LoRA** adapters provide competitive multi‑label performance with small memory/compute budgets. Slightly longer training (5–6 epochs) can improve macro‑F1, capturing more minority labels with minimal micro‑F1 trade‑off. |
| |
|
| | --- |
| |
|
| | ## Model Examination |
| |
|
| | - Inspect confidence/threshold curves per label to tune decision thresholds. |
| | - Use error analysis on false negatives for long‑tail labels; consider reweighting or augmentation. |
| |
|
| | --- |
| |
|
| | ## Environmental Impact |
| |
|
| | - **Hardware Type:** Single laptop GPU (RTX 3080 Ti Laptop, 16 GB) |
| | - **Hours used (example run):** ~0.36 hours |
| | --- |
| |
|
| | ## Technical Specifications |
| |
|
| | ### Model Architecture and Objective |
| |
|
| | - **Architecture:** Decoder‑only Transformer (Llama 3.2 class), adapted via **LoRA**. |
| | - **Objective:** Multi‑label classification formulated as conditional generation with sigmoid/thresholding for label decisions. |
| |
|
| | ### Compute Infrastructure |
| |
|
| | #### Hardware |
| |
|
| | - Laptop with NVIDIA GeForce RTX 3080 Ti (laptop) GPU, 16 GB VRAM. |
| |
|
| | #### Software |
| |
|
| | - Python, PyTorch, Hugging Face Transformers, PEFT, (optional) bitsandbytes for 4‑bit. |
| |
|
| | --- |
| |
|
| | ## Citation |
| |
|
| | If you use this work, please consider citing the following: |
| |
|
| | **BibTeX:** |
| |
|
| | ```bibtex |
| | @article{Hu2021LoRA, |
| | title={LoRA: Low-Rank Adaptation of Large Language Models}, |
| | author={Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen}, |
| | journal={arXiv preprint arXiv:2106.09685}, |
| | year={2021} |
| | } |
| | |
| | @article{Dettmers2023QLoRA, |
| | title={QLoRA: Efficient Finetuning of Quantized LLMs}, |
| | author={Tim Dettmers and Artidoro Pagnoni and Ari Holtzman and Luke Zettlemoyer}, |
| | journal={arXiv preprint arXiv:2305.14314}, |
| | year={2023} |
| | } |
| | ``` |
| |
|
| | **APA:** |
| |
|
| | - Hu, E. J., Shen, Y., Wallis, P., Allen‑Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). *LoRA: Low‑Rank Adaptation of Large Language Models*. arXiv:2106.09685. |
| | - Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). *QLoRA: Efficient Finetuning of Quantized LLMs*. arXiv:2305.14314. |
| |
|
| | --- |
| |
|
| | ## Glossary |
| |
|
| | - **LoRA:** Low‑Rank Adaptation; injects small trainable matrices into a frozen backbone to adapt it efficiently. |
| | - **QLoRA (4‑bit):** Finetuning with the backbone quantized to 4‑bit precision, training only LoRA adapters. |
| | - **Micro‑/Macro‑F1:** Micro aggregates over all instances; Macro averages over classes equally (sensitive to minority classes). |
| |
|
| | --- |
| |
|
| | ## More Information |
| |
|
| | - The repo ships a minimal CLI (`llm_cls/cli.py`) and example YAML config (`configs/default.yaml`) to reproduce results. |
| | - For non‑Linux environments or if `bitsandbytes` is unavailable, disable 4‑bit and train in standard precision. |
| |
|
| | --- |
| |
|
| | ## Model Card Authors |
| |
|
| | - **Author/Maintainer:** Amirhossein Yousefi (`amirhossein-yousefi` / `Amirhossein75`) |
| |
|
| | ## Model Card Contact |
| |
|
| | - Open an issue in the GitHub repository or contact the Hugging Face user `Amirhossein75`. |
| |
|