YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Nepali Automatic Speech Recognition (ASR)

Overview

Fine-tuning and inference for Nepali language speech recognition using Wav2Vec2 and Whisper models.

Model Details

Property Value
Model ID Saugat212/ASR_MODEL
Base Model facebook/wav2vec2-base
Architecture wav2vec2
Parameters 0.3B
Language Nepali

Purpose

  • Convert Nepali speech audio to text
  • Fine-tune Wav2Vec2 on Nepali datasets
  • Evaluate ASR performance using WER metric

Contents

File Description
whisper_transcription.ipynb Whisper model for Nepali speech-to-text transcription
wav2vec2_finetuning.ipynb Wav2Vec2 fine-tuning recipe for Nepali ASR
wav2vec2_finetune.py Python script for Wav2Vec2 fine-tuning
finetune.py ASR fine-tuning script
Dataset/ Training datasets (CSV files with audio paths and transcriptions)
Phase 1/Finetuning/ Phase 1 training data, checkpoints, and inference notebooks

Usage

Load Model

from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC

model_name = "Saugat212/ASR_MODEL"
processor = Wav2Vec2Processor.from_pretrained(model_name)
model = Wav2Vec2ForCTC.from_pretrained(model_name)

Inference

import torchaudio
import torch

# Load audio
waveform, sample_rate = torchaudio.load("audio.wav")

# Process
input_values = processor(waveform.squeeze(), return_tensors="pt", sampling_rate=sample_rate).input_values

# Infer
with torch.no_grad():
    logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)

# Decode
transcription = processor.batch_decode(predicted_ids)[0]
print(transcription)

Models Available

  • Wav2Vec2: Saugat212/ASR_MODEL - Fine-tuned Nepali ASR
  • Whisper: OpenAI Whisper for alternative transcription

Dataset

  • Located in Dataset/
  • Contains final_transcriptions.csv with audio paths and transcriptions
  • Cleaned data in cleaned_data.csv

Requirements

  • transformers
  • torchaudio
  • datasets
  • evaluate
  • jiwer

Fine-tuning

See wav2vec2_finetuning.ipynb for complete fine-tuning pipeline.

Downloads last month
1
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support