Introduction

GUI-Owl 1.5 is the next-generation native GUI agent model family built on Qwen3-VL. It supports multi-platform GUI automation across desktops, mobile devices, browsers, and more. Powered by a scalable hybrid data flywheel, unified agent capability enhancement, and multi-platform environment RL (MRPO), GUI-Owl 1.5 offers a full spectrum of models.

Key highlights:

  • 🏆 State-of-the-art among multi-platform GUI models on OSWorld-Verified, AndroidWorld, Mobile-World, WindowsAA, ScreenSpot-v2, ScreenSpot-Pro, and more.
  • 🔧 Tool & MCP calling: Native support for external tool invocation and MCP server coordination, achieving top performance on OSWorld-MCP and Mobile-World.
  • 🧠 Long-horizon memory: Built-in memory capability without external workflow orchestration, leading all native agent models on MemGUI-Bench.
  • 🤝 Multi-agent ready: Serves both as a standalone end-to-end agent and as specialized roles (planner, executor, verifier, notetaker) within the Mobile-Agent-v3.5 framework.
  • Instruct & Thinking variants: Smaller instruct models for fast inference and edge deployment; larger thinking models for complex tasks requiring planning and reflection.

Performance

End-to-End Online Benchmarks

Model OSWorld-Verified AndroidWorld OSWorld-MCP Mobile-World WindowsAA WebArena VisualWebArena WebVoyager Online-Mind2Web
GUI-Owl-1.5-2B-Instruct 43.5 67.9 33.0 31.3 25.8 - - - -
GUI-Owl-1.5-4B-Instruct 48.2 69.8 31.7 32.3 29.4 - - - -
GUI-Owl-1.5-8B-Instruct 52.3 69.0 41.8 41.8 31.7 45.7 39.4 69.9 41.7
GUI-Owl-1.5-8B-Thinking 52.9 71.6 38.8 33.3 35.1 46.7 40.8 78.1 48.6
GUI-Owl-1.5-32B-Instruct 56.5 69.4 47.6 46.8 44.8 - - - -
GUI-Owl-1.5-32B-Thinking 56.0 68.2 43.8 42.8 44.1 48.4 46.6 82.1 -

Grounding Benchmarks

Please refer to the technical report for detailed results on ScreenSpot-v2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, and more.

Usage

Please refer to our cookbook.

Deploy

We recommand deploy GUI-Owl-1.5 through vllm

This script has been validated on an A100 with 96 GB of VRAM.

PIXEL_ARGS='{"size": {"longest_edge": 3072000, "shortest_edge": 65536}}'
IMAGE_LIMIT_ARGS='image=5'
MP_SIZE=1

vllm serve $CKPT \
    --max-model-len 32768 \
    --mm-processor-kwargs "$PIXEL_ARGS" \
    --limit-mm-per-prompt "$IMAGE_LIMIT_ARGS" \
    --tensor-parallel-size $MP_SIZE \
    --allowed-local-media-path '/' \
    --port 4243 \

Citation

If you find this model useful, please cite our paper:

@article{MobileAgentv3.5,
  title={Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents},
  author={Haiyang Xu, Xi Zhang, Haowei Liu, Junyang Wang, Zhaozai Zhu, Shengjie Zhou, Xuhao Hu, Feiyu Gao, Junjie Cao, Zihua Wang, Zhiyuan Chen, Jitong Liao, Qi Zheng, Jiahui Zeng, Ze Xu, Shuai Bai, Junyang Lin, Jingren Zhou, Ming Yan},
  journal={arXiv preprint arXiv:2602.16855},
  year={2026}
}
Downloads last month
277
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mPLUG/GUI-Owl-1.5-8B-Instruct

Quantizations
2 models

Paper for mPLUG/GUI-Owl-1.5-8B-Instruct