AISA-AR-FunctionCall-FT

Reliable Arabic Structured Tool Calling via Data-Centric Fine-Tuning

AISA-AR-FunctionCall-FT is a fully fine-tuned Arabic function-calling model built on top of FunctionGemma (Gemma 3 270M) and optimized for structured tool invocation in Arabic agentic systems.

The model converts natural Arabic requests into structured executable API calls, enabling reliable integration between language models and external tools.

This model is part of the AISA (Agentic AI Systems Architecture) initiative.

Try the Model in Google Colab

You can run a full inference example using the notebook below.

Open In Colab

The notebook demonstrates:

  • Loading the model
  • Defining tool schemas
  • Generating structured tool calls
  • Parsing function call outputs

Model Overview

Field Value
Model name AISA-AR-FunctionCall-FT
Base model unsloth/functiongemma-270m-it
Architecture Gemma 3 (270M parameters)
Fine-tuning type Full-parameter supervised fine-tuning
Primary task Arabic function calling / tool invocation

The model is designed to translate Arabic natural language requests into structured tool calls following the FunctionGemma tool-calling format.


Key Capabilities

  • Arabic natural language → structured API calls
  • Multi-dialect Arabic understanding
  • Tool selection and argument extraction
  • Structured execution environments

Supported domains:

Domain
Travel
Utilities
Islamic services
Weather
Healthcare
Banking & finance
E-commerce
Government services

Dataset

The model is trained on AISA-AR-FunctionCall — a production-ready Arabic function-calling dataset built through a rigorous data-centric pipeline:

  • Dataset auditing
  • Schema normalization
  • Enum correction
  • Tool pruning
  • Prompt restructuring
  • Tool sampling

Dataset splits:

Split Samples
Train 41,104
Validation 4,568
Test 5,079

Dataset includes:

  • 5 Arabic dialects
  • 8 real-world domains
  • 27 tool schemas
  • Structured tool-call annotations

Dataset: AISA-Framework/AISA-AR-FunctionCall


Training Methodology

The model was trained using a data-centric fine-tuning pipeline designed to stabilize structured execution.

Key pipeline steps:

  1. Structural dataset auditing
  2. Enum constraint repair
  3. Tool schema normalization
  4. Tool pruning (36 → 27 tools)
  5. Tool sampling to prevent prompt truncation
  6. FunctionGemma-compatible chat serialization
  7. Completion-only supervised fine-tuning

Training configuration:

Parameter Value
Model size 270M
Training type Full fine-tuning
Epochs 2
Effective batch size 32
Learning rate 2e-5
Optimizer 8-bit AdamW
Scheduler Cosine
Precision BF16
Gradient checkpointing Enabled

Evaluation Results

Evaluation was performed on a held-out test set of 5,079 samples.

Clean Positive Evaluation (n = 2,873)

Metric Baseline AISA-AR-FunctionCall-FT
Function Name Accuracy 0.0804 0.6547
Full Tool-Call Match 0.0056 0.3362
Argument Key F1 0.0600 0.5728
Argument Exact Match 0.0422 0.6377
Parse Failure Rate 0.8726 0.0084
Format Validity 0.1274 0.9916
Hallucination Rate 0.0003 0.0226

Key improvement: Parse failure reduced from 87% → <1%

Dialect Performance

Dialect Function Accuracy
MSA 0.761
Gulf 0.697
Egyptian 0.683
Levantine 0.694
Maghrebi 0.616

Fine-tuning significantly reduces dialect disparity compared to the baseline model.


Known Limitations

Remaining errors are primarily semantic, including:

  • Tool selection ambiguity
  • Argument mismatches
  • Domain overlap (e.g., weather vs. air quality)

Structured formatting errors are largely eliminated.


Example Usage

Prompt:

ما حالة الطقس في الرياض اليوم؟

Model output:

<start_function_call>
call:get_weather{
  city:<escape>الرياض<escape>,
  days:1
}
<end_function_call>

The structured call can then be executed by the application runtime.


Intended Use

This model is designed for:

  • Arabic AI assistants
  • Tool-based agents
  • Structured API orchestration
  • Arabic enterprise automation
  • Research on multilingual tool calling

Out-of-Scope Uses

This model is not designed for:

  • General chatbots or open-ended conversation
  • Sensitive decision-making systems
  • Safety-critical deployments without additional validation

Related Models

Model Description
AISA-AR-FunctionCall-Think Reasoning-augmented tool-calling model

AISA Framework

This model is part of the AISA initiative for building reliable agentic AI systems.

Model collection: AISA-Framework/aisa-arabic-functioncall-datasets-and-models


License

Apache 2.0

Downloads last month
590
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AISA-Framework/AISA-AR-FunctionCall-FT

Finetuned
(50)
this model
Adapters
2 models

Dataset used to train AISA-Framework/AISA-AR-FunctionCall-FT

Collection including AISA-Framework/AISA-AR-FunctionCall-FT