Instructions to use bigcode/starpii with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bigcode/starpii with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="bigcode/starpii")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("bigcode/starpii") model = AutoModelForTokenClassification.from_pretrained("bigcode/starpii") - Notebooks
- Google Colab
- Kaggle
Update/Fix incorrect model_max_length to 1024 tokens
Currently, the field model_max_length is set to be 1000000000000000019884624838656 tokens which is incorrect. This leads to this model when being used in a pipeline either cannot enable automatic truncating when the length gets exceeded which get an error thrown like RuntimeError: The expanded size of the tensor (<SOME NUMBER LARGER THAN 1024>) must match the existing size (1024) at non-singleton dimension 1. Target sizes: [1, <SOME NUMBER LARGER THAN 1024>]. Tensor sizes: [1, 1024], or it cannot use the stride option which also relies on a correct model_max_length being provided.
Description of stride option in a token classification pipeline:If stride is provided, the pipeline is applied on all the text. The text is split into chunks of size model_max_length. Works only with fast tokenizers and aggregation_strategy different from NONE. The value of this argument defines the number of overlapping tokens between chunks. In other words, the model will shift forward by tokenizer.model_max_length - stride tokens each step.