YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Nepali Automatic Speech Recognition (ASR)
Overview
Fine-tuning and inference for Nepali language speech recognition using Wav2Vec2 and Whisper models.
Model Details
| Property | Value |
|---|---|
| Model ID | Saugat212/ASR_MODEL |
| Base Model | facebook/wav2vec2-base |
| Architecture | wav2vec2 |
| Parameters | 0.3B |
| Language | Nepali |
Purpose
- Convert Nepali speech audio to text
- Fine-tune Wav2Vec2 on Nepali datasets
- Evaluate ASR performance using WER metric
Contents
| File | Description |
|---|---|
whisper_transcription.ipynb |
Whisper model for Nepali speech-to-text transcription |
wav2vec2_finetuning.ipynb |
Wav2Vec2 fine-tuning recipe for Nepali ASR |
wav2vec2_finetune.py |
Python script for Wav2Vec2 fine-tuning |
finetune.py |
ASR fine-tuning script |
Dataset/ |
Training datasets (CSV files with audio paths and transcriptions) |
Phase 1/Finetuning/ |
Phase 1 training data, checkpoints, and inference notebooks |
Usage
Load Model
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
model_name = "Saugat212/ASR_MODEL"
processor = Wav2Vec2Processor.from_pretrained(model_name)
model = Wav2Vec2ForCTC.from_pretrained(model_name)
Inference
import torchaudio
import torch
# Load audio
waveform, sample_rate = torchaudio.load("audio.wav")
# Process
input_values = processor(waveform.squeeze(), return_tensors="pt", sampling_rate=sample_rate).input_values
# Infer
with torch.no_grad():
logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)
# Decode
transcription = processor.batch_decode(predicted_ids)[0]
print(transcription)
Models Available
- Wav2Vec2:
Saugat212/ASR_MODEL- Fine-tuned Nepali ASR - Whisper: OpenAI Whisper for alternative transcription
Dataset
- Located in
Dataset/ - Contains
final_transcriptions.csvwith audio paths and transcriptions - Cleaned data in
cleaned_data.csv
Requirements
- transformers
- torchaudio
- datasets
- evaluate
- jiwer
Fine-tuning
See wav2vec2_finetuning.ipynb for complete fine-tuning pipeline.
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support