Quantizations of https://huggingface.co/microsoft/Phi-4-mini-reasoning
Open source inference clients/UIs
Closed source inference clients/UIs
- LM Studio
- More will be added...
From original readme
Phi-4-mini-reasoning is a lightweight open model built upon synthetic data with a focus on high-quality, reasoning dense data further finetuned for more advanced math reasoning capabilities. The model belongs to the Phi-4 model family and supports 128K token context length.
Usage
Tokenizer
Phi-4-mini-reasoning supports a vocabulary size of up to 200064 tokens. The tokenizer files already provide placeholder tokens that can be used for downstream fine-tuning, but they can also be extended up to the model's vocabulary size.
Input Formats
Given the nature of the training data, the Phi-4-mini-instruct model is best suited for prompts using specific formats. Below are the two primary formats:
Chat format
This format is used for general conversation and instructions:
<|system|>Your name is Phi, an AI math expert developed by Microsoft.<|end|><|user|>How to solve 3*x^2+4*x+5=1?<|end|><|assistant|>
Inference with transformers
Phi-4-mini-reasoning has been integrated in the 4.51.3 version of transformers. The current transformers version can be verified with: pip list | grep transformers.
Python 3.8 and 3.10 will work best.
List of required packages:
flash_attn==2.7.4.post1
torch==2.5.1
transformers==4.51.3
accelerate==1.3.0
Phi-4-mini-reasoning is also available in Azure AI Studio
Example
After obtaining the Phi-4-mini-instruct model checkpoints, users can use this sample code for inference.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
torch.random.manual_seed(0)
model_id = "microsoft/Phi-4-mini-reasoning"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="cuda",
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [{
"role": "user",
"content": "How to solve 3*x^2+4*x+5=1?"
}]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt",
)
outputs = model.generate(
**inputs.to(model.device),
max_new_tokens=32768,
temperature=0.8,
top_p=0.95,
do_sample=True,
)
outputs = tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[-1]:])
print(outputs[0])
- Downloads last month
- 243
1-bit
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit