Text Generation
Transformers
Safetensors
starcoder2
code
Eval Results (legacy)
text-generation-inference
Instructions to use bigcode/starcoder2-15b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bigcode/starcoder2-15b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bigcode/starcoder2-15b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("bigcode/starcoder2-15b") model = AutoModelForCausalLM.from_pretrained("bigcode/starcoder2-15b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use bigcode/starcoder2-15b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bigcode/starcoder2-15b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/starcoder2-15b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/bigcode/starcoder2-15b
- SGLang
How to use bigcode/starcoder2-15b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bigcode/starcoder2-15b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/starcoder2-15b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bigcode/starcoder2-15b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/starcoder2-15b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use bigcode/starcoder2-15b with Docker Model Runner:
docker model run hf.co/bigcode/starcoder2-15b
NVIDIA framework and contribution updates.
#3
by Criztov - opened
README.md
CHANGED
|
@@ -30,7 +30,8 @@ tags:
|
|
| 30 |
|
| 31 |
## Model Summary
|
| 32 |
|
| 33 |
-
StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from [The Stack v2](https://huggingface.co/datasets/bigcode/the-stack-v2-train), with opt-out requests excluded. The model uses [Grouped Query Attention](https://arxiv.org/abs/2305.13245), [a context window of 16,384 tokens](https://arxiv.org/abs/2205.14135) with [a sliding window attention of 4,096 tokens](https://arxiv.org/abs/2004.05150v2), and was trained using the [Fill-in-the-Middle objective](https://arxiv.org/abs/2207.14255) on 4+ trillion tokens.
|
|
|
|
| 34 |
|
| 35 |
- **Project Website:** [bigcode-project.org](https://www.bigcode-project.org)
|
| 36 |
- **Paper:** [Link](https://huggingface.co/datasets/bigcode/the-stack-v2/)
|
|
@@ -135,11 +136,11 @@ The model has been trained on source code from 600+ programming languages. The p
|
|
| 135 |
|
| 136 |
## Hardware
|
| 137 |
|
| 138 |
-
- **GPUs:** 1024
|
| 139 |
|
| 140 |
## Software
|
| 141 |
|
| 142 |
-
- **Framework:** [NeMo](https://
|
| 143 |
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
|
| 144 |
|
| 145 |
# License
|
|
|
|
| 30 |
|
| 31 |
## Model Summary
|
| 32 |
|
| 33 |
+
StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from [The Stack v2](https://huggingface.co/datasets/bigcode/the-stack-v2-train), with opt-out requests excluded. The model uses [Grouped Query Attention](https://arxiv.org/abs/2305.13245), [a context window of 16,384 tokens](https://arxiv.org/abs/2205.14135) with [a sliding window attention of 4,096 tokens](https://arxiv.org/abs/2004.05150v2), and was trained using the [Fill-in-the-Middle objective](https://arxiv.org/abs/2207.14255) on 4+ trillion tokens.
|
| 34 |
+
The model was trained with [NVIDIA NeMo™ Framework](https://www.nvidia.com/en-us/ai-data-science/generative-ai/nemo-framework/) using the [NVIDIA Eos Supercomputer](https://blogs.nvidia.com/blog/eos/) built with [NVIDIA DGX H100](https://www.nvidia.com/en-us/data-center/dgx-h100/) systems.
|
| 35 |
|
| 36 |
- **Project Website:** [bigcode-project.org](https://www.bigcode-project.org)
|
| 37 |
- **Paper:** [Link](https://huggingface.co/datasets/bigcode/the-stack-v2/)
|
|
|
|
| 136 |
|
| 137 |
## Hardware
|
| 138 |
|
| 139 |
+
- **GPUs:** 1024 x H100
|
| 140 |
|
| 141 |
## Software
|
| 142 |
|
| 143 |
+
- **Framework:** [NeMo Framework](https://www.nvidia.com/en-us/ai-data-science/generative-ai/nemo-framework/)
|
| 144 |
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
|
| 145 |
|
| 146 |
# License
|