Update README.md

6251f9a verified 6 months ago

13.3 kB

	---
	# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
	# Doc / guide: https://huggingface.co/docs/hub/model-cards
	# datasets:
	# - "pietrolesci/amazoncat-13k"
	# language:
	# - en
	# library_name: transformers
	# license: other # Base model (Meta Llama 3.2) is under the Llama 3.2 Community License
	# pipeline_tag: text-classification
	# tags:
	# - multi-label
	# - LoRA
	# - QLoRA
	# - bitsandbytes
	# - decoder-only
	# - llama-3.2-1b
	# - peft
	# - text-classification
	# - adapter:meta-llama/Llama-3.2-1B"
	# base_model: meta-llama/Llama-3.2-1B

	{
	"library_name": "peft",
	"pipeline_tag": "text-classification",
	"license": "llama3.2",
	"tags": [
	"LoRA",
	"QLoRA",
	"multi-label",
	"text-classification",
	"decoder-only",
	"peft",
	"transformers",
	"trl",
	"bitsandbytes",
	"base_model:adapter:meta-llama/Llama-3.2-1B"
	],
	"base_model": "meta-llama/Llama-3.2-1B",
	"datasets": ["pietrolesci/amazoncat-13k"],
	}
	---

	# Model Card for Amirhossein75/LLM-Decoder-Tuning-Text-Classification

	> One‑line summary: Decoder‑only LLMs (e.g., Llama‑3.2‑1B) fine‑tuned for multi‑label text classification using LoRA adapters, with optional 4‑bit QLoRA quantization for memory‑efficient training and inference. A clean CLI and YAML config make it easy to reproduce results and swap backbones.

	This model card accompanies the repository LLM‑Decoder‑Tuning‑Text‑Classification and documents a practical recipe for using decoder‑only LLMs as strong multi‑label classifiers with parameter‑efficient fine‑tuning (PEFT).

	> Note: This card describes a training pipeline + example checkpoints. If you push a specific checkpoint to the Hub, please fill in exact dataset splits, metrics, and license at upload time.

	---

	## Model Details

	### Model Description

	This project provides a modular training & inference stack for multi‑label text classification built on top of Hugging Face Transformers and PEFT. It adapts decoder‑only LLMs (tested with `meta-llama/Llama-3.2-1B`) using LoRA adapters, and optionally enables 4‑bit quantization (QLoRA‑style) for reduced memory footprint during training and inference. The repository exposes a single CLI for train/eval/predict and a YAML configuration to control data paths, model choice, and hyperparameters.

	- Developed by: Amirhossein Yousefi (GitHub: `amirhossein-yousefi`; Hugging Face: `Amirhossein75`)
	- Model type: Decoder‑only causal LM with PEFT (LoRA) for multi‑label classification
	- Language(s): English (evaluated on AmazonCat‑13K subset)
	- License: The base model (`meta-llama/Llama-3.2-1B`) is under the Llama 3.2 Community License. The LoRA adapter you publish should declare its own license and acknowledge base‑model terms.
	- Finetuned from: `meta-llama/Llama-3.2-1B` (foundation)

	### Model Sources

	- Repository: https://github.com/amirhossein-yousefi/LLM-Decoder-Tuning-Text-Classification
	- Model (Hub placeholder): https://huggingface.co/Amirhossein75/LLM-Decoder-Tuning-Text-Classification
	- Background reading:
	- LoRA: Low‑Rank Adaptation of Large Language Models (Hu et al., 2021)
	- QLoRA: Efficient Finetuning of Quantized LLMs (Dettmers et al., 2023)
	- PEFT documentation (Hugging Face)

	---

	## Uses

	### Direct Use

	- Multi‑label text classification on English corpora (e.g., product tagging, topic tagging, content routing).
	- Inference via:
	- Provided CLI (`python -m llm_cls.cli predict --config ...`) producing JSONL predictions.
	- Hugging Face pipelines with base model + LoRA adapter loaded (see “How to Get Started”).

	### Downstream Use

	- Domain transfer: Re‑train on your domain labels by pointing the YAML to your CSVs.
	- Backbone swap: Replace `model.model_name` in the config to try other decoders or encoders (set `use_4bit=false` for encoders).

	### Out‑of‑Scope Use

	- Safety‑critical decisions without human oversight.
	- Tasks requiring extreme multilabel scaling (e.g., hundreds of thousands of labels) without additional adaptation.
	- Non‑English or code‑mixed data without validation.
	- Any use that conflicts with the base model’s license and acceptable‑use policies.

	---

	## Bias, Risks, and Limitations

	- Dataset bias: AmazonCat‑13K originates from product data; labels and text reflect marketplace distributions and may encode demographic or topical biases.
	- Multi‑label long tail: Minority classes are harder; macro‑F1 often trails micro‑F1. Consider class weighting, augmentation, or threshold tuning.
	- Decoder framing: Treating classification as generation can be sensitive to prompt/format and decoding thresholds.
	- License & usage constraints: Ensure compliance with the Llama 3.2 Community License for derivatives and deployment.

	### Recommendations

	- Track micro‑ and macro‑F1 and per‑class metrics.
	- Use threshold tuning on validation to balance precision/recall per class.
	- For memory‑constrained environments, prefer 4‑bit + LoRA; otherwise disable 4‑bit on platforms without `bitsandbytes` support.

	---

	## How to Get Started with the Model

	Below is an example of loading a base Llama model with a LoRA adapter for classification‑style inference. Replace `BASE_MODEL` and `ADAPTER_REPO` with your IDs.

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, TextGenerationPipeline
	from peft import PeftModel
	import torch

	BASE_MODEL = "meta-llama/Llama-3.2-1B"
	ADAPTER_REPO = "Amirhossein75/LLM-Decoder-Tuning-Text-Classification" # or your own adapter

	tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, use_fast=True)
	base = AutoModelForCausalLM.from_pretrained(
	BASE_MODEL,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)
	model = PeftModel.from_pretrained(base, ADAPTER_REPO)
	model.eval()

	# Simple prompt format for multi-label classification (adjust to your training format).
	labels = ["books","movies_tv","music","pop","literature_fiction","movies","education_reference","rock","used_rental_textbooks","new"]
	text = "A thrilling space opera with deep character arcs and rich world-building."

	prompt = (
	"You are a classifier. Given the text, return a JSON list of applicable labels from this set: "
	+ ", ".join(labels) + ".\n"
	+ f"Text: {text}\nLabels: "
	)

	pipe = TextGenerationPipeline(model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1)
	out = pipe(prompt, max_new_tokens=64, do_sample=False)
	print(out[0]["generated_text"])
	```

	For CLI usage:

	```bash
	# Train
	python -m llm_cls.cli train --config configs/default.yaml

	# Predict
	python -m llm_cls.cli predict --config configs/default.yaml --input_csv data/test.csv --output_jsonl preds.jsonl
	```

	---

	## Training Details

	### Training Data

	- Dataset: AmazonCat‑13K (example subset; top‑10 categories for illustration). If you use the full dataset, update CSV paths and label columns accordingly.
	- Format: CSV with at least a text column and one or more label columns (multi‑label). Configure names in `configs/default.yaml`.
	- Splits: Train / Validation / (Optional) Test; sample scripts are provided to create CSV splits.

	### Training Procedure

	#### Preprocessing

	- Tokenization with the base model’s tokenizer.
	- Optional script to prepare AmazonCat‑13K CSVs (see `split_amazon_13k_data.py` in the repo).

	#### Training Hyperparameters (illustrative config)

	- Base model: `meta-llama/Llama-3.2-1B`
	- Problem type: `multi_label_classification`
	- Precision / quantization: `use_4bit: true` (QLoRA‑style); `torch_dtype: bfloat16` for computation
	- LoRA: `r=2`, `alpha=2`, `dropout=0.05`
	- LoRA target modules: `["q_proj","k_proj","v_proj","o_proj","gate_proj","down_proj","up_proj"]`
	- Batch size: `4` (with `gradient_accumulation_steps=8`)
	- Max length: `1024`
	- Optimizer: 8‑bit optimizer when quantized (`optim_8bit_when_4bit: true`)
	- Epochs: up to `20` with early stopping (`patience=2`)
	- Metric for best model: `f1_micro`

	#### Speeds, Sizes, Times (example run)

	- Device: NVIDIA GeForce RTX 3080 Ti Laptop GPU (16 GB VRAM)
	- Runtime: ~1,310 seconds for the best run
	- Throughput: ≈0.784 steps/s (≈24.9 samples/s) during training
	- Artifacts: Reproducible outputs under `outputs/<model_name>/<dataset_name>/run_<i>/`

	---

	## Evaluation

	### Testing Data, Factors & Metrics

	- Testing data: Held‑out split from AmazonCat‑13K (example subset).
	- Factors: Evaluate both micro‑F1 (overall) and macro‑F1 (per‑class average) to reflect long‑tail performance.
	- Metrics: `f1_micro`, `f1_macro`, eval loss, throughput (steps/s, samples/s).
	### Metrics

	- Best overall (micro-F1): 0.830 at 5 epochs
	- Best minority‑class sensitivity (macro-F1): 0.752 at 6 epochs
	- Average across 4 runs: micro‑F1 0.824, macro‑F1 0.741, eval loss 0.161
	- Throughput: train ≈ 0.784 steps/s (24.9 samples/s) ; eval time ≈ 34.0s per run.

	> Interpretation: going from 4 → 5 epochs gives the best micro‑F1; 6 epochs squeezes out the top macro‑F1, hinting at slightly better coverage of minority classes with a tiny trade‑off in micro‑F1.

	---
	### 📈 Per‑run metrics
	\| Run \| Epochs \| Train Loss \| Eval Loss \| F1 (micro) \| F1 (macro) \| Train Time (s) \| Train steps/s \| Train samples/s \| Eval Time (s) \|
	\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|
	\| 1 \| 4 \| 1.400 \| 0.157 \| 0.824 \| 0.738 \| 1309.6 \| 0.962 \| 30.543 \| 33.6 \|
	\| 2 \| 5 \| 1.220 \| 0.159 \| 0.830 \| 0.743 \| 1640.3 \| 0.768 \| 24.385 \| 34.0 \|
	\| 3 \| 6 \| 1.063 \| 0.162 \| 0.826 \| 0.752 \| 1984.2 \| 0.635 \| 20.159 \| 34.4 \|
	\| 4 \| 5 \| 1.265 \| 0.165 \| 0.816 \| 0.729 \| 1639.3 \| 0.769 \| 24.401 \| 34.0 \|

	<sub>F1(micro) aggregates decisions over all samples; F1(macro) averages per‑class F1 equally, highlighting minority‑class performance.</sub>

	### Results (example)

	- Best micro‑F1: `0.830` at 5 epochs
	- Best macro‑F1: `0.752` at 6 epochs
	- Average across 4 runs: micro‑F1 `0.824`, macro‑F1 `0.741`, eval loss `0.161`

	#### Summary

	Decoder‑only LLMs with LoRA adapters provide competitive multi‑label performance with small memory/compute budgets. Slightly longer training (5–6 epochs) can improve macro‑F1, capturing more minority labels with minimal micro‑F1 trade‑off.

	---

	## Model Examination

	- Inspect confidence/threshold curves per label to tune decision thresholds.
	- Use error analysis on false negatives for long‑tail labels; consider reweighting or augmentation.

	---

	## Environmental Impact

	- Hardware Type: Single laptop GPU (RTX 3080 Ti Laptop, 16 GB)
	- Hours used (example run): ~0.36 hours
	---

	## Technical Specifications

	### Model Architecture and Objective

	- Architecture: Decoder‑only Transformer (Llama 3.2 class), adapted via LoRA.
	- Objective: Multi‑label classification formulated as conditional generation with sigmoid/thresholding for label decisions.

	### Compute Infrastructure

	#### Hardware

	- Laptop with NVIDIA GeForce RTX 3080 Ti (laptop) GPU, 16 GB VRAM.

	#### Software

	- Python, PyTorch, Hugging Face Transformers, PEFT, (optional) bitsandbytes for 4‑bit.

	---

	## Citation

	If you use this work, please consider citing the following:

	BibTeX:

	```bibtex
	@article{Hu2021LoRA,
	title={LoRA: Low-Rank Adaptation of Large Language Models},
	author={Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen},
	journal={arXiv preprint arXiv:2106.09685},
	year={2021}
	}

	@article{Dettmers2023QLoRA,
	title={QLoRA: Efficient Finetuning of Quantized LLMs},
	author={Tim Dettmers and Artidoro Pagnoni and Ari Holtzman and Luke Zettlemoyer},
	journal={arXiv preprint arXiv:2305.14314},
	year={2023}
	}
	```

	APA:

	- Hu, E. J., Shen, Y., Wallis, P., Allen‑Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low‑Rank Adaptation of Large Language Models. arXiv:2106.09685.
	- Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314.

	---

	## Glossary

	- LoRA: Low‑Rank Adaptation; injects small trainable matrices into a frozen backbone to adapt it efficiently.
	- QLoRA (4‑bit): Finetuning with the backbone quantized to 4‑bit precision, training only LoRA adapters.
	- Micro‑/Macro‑F1: Micro aggregates over all instances; Macro averages over classes equally (sensitive to minority classes).

	---

	## More Information

	- The repo ships a minimal CLI (`llm_cls/cli.py`) and example YAML config (`configs/default.yaml`) to reproduce results.
	- For non‑Linux environments or if `bitsandbytes` is unavailable, disable 4‑bit and train in standard precision.

	---

	## Model Card Authors

	- Author/Maintainer: Amirhossein Yousefi (`amirhossein-yousefi` / `Amirhossein75`)

	## Model Card Contact

	- Open an issue in the GitHub repository or contact the Hugging Face user `Amirhossein75`.