Fill-Mask
Transformers
Safetensors
Luxembourgish
modernbert
encoder
luxembourgish
multilingual
masked-language-modeling
Instructions to use instilux/ltz-e1-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use instilux/ltz-e1-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="instilux/ltz-e1-base")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("instilux/ltz-e1-base") model = AutoModelForMaskedLM.from_pretrained("instilux/ltz-e1-base") - Notebooks
- Google Colab
- Kaggle
LTZ E1 (base)
A ModernBERT-based masked language model pretrained on Luxembourgish, following the Ettin recipe (see here: https://huggingface.co/jhu-clsp/ettin-encoder-150m)
Model Details
- Architecture: ModernBERT (encoder)
- Size: base
- Vocabulary: 50,368 tokens (BPE, GPTNeoXTokenizerFast)
- Context length: 1,024 tokens
- Language: Luxembourgish (
lb/ltz) - License: CC BY-SA 4.0
Usage
Requires transformers>=4.48.0.
from transformers import AutoModelForMaskedLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("instilux/ltz-e1-base")
model = AutoModelForMaskedLM.from_pretrained("instilux/ltz-e1-base")
inputs = tokenizer("Wéi spéit [MASK] et?", return_tensors="pt")
mask_pos = (inputs["input_ids"] == tokenizer.mask_token_id).nonzero(as_tuple=True)[1]
with torch.no_grad():
outputs = model(**inputs)
top_tokens = outputs.logits[0, mask_pos].topk(5)
for token_id, score in zip(top_tokens.indices[0], top_tokens.values[0]):
token = tokenizer.decode(token_id)
print(f"{token:15s} {score:.3f}")
Tokenizer Notes
The tokenizer is BPE-based (GPTNeoXTokenizerFast) with BERT-style special tokens ([CLS], [SEP], [MASK], [PAD]). A [CLS] token is prepended automatically (add_bos_token: true).
Citation
A paper describing this model will be published soon. In the meantime, please cite this repository if you use this model in your work.
- Downloads last month
- 10