MegaFlow: Zero-Shot Large Displacement Optical Flow

Dingxi Zhang · Fangjinhua Wang · Marc Pollefeys · Haofei Xu

ETH Zurich · Microsoft · University of Tübingen, Tübingen AI Center

Project Page arXiv GitHub Colab


MegaFlow is a simple, powerful, and unified model for zero-shot large displacement optical flow and point tracking.

MegaFlow leverages pre-trained Vision Transformer features to naturally capture extreme motion, followed by lightweight iterative refinement for sub-pixel accuracy. It achieves state-of-the-art zero-shot performance across major optical flow benchmarks (Sintel, KITTI, Spring) and delivers highly competitive zero-shot generalizability on long-range point tracking benchmarks.

Highlights

  • 🏆 State-of-the-art zero-shot performance on Sintel, KITTI, and Spring
  • 🎯 Designed for large displacement optical flow
  • 📹 Flexible temporal window — processes any number of frames at once
  • 🔄 Single backbone for both optical flow and long-range point tracking

Available Models

Model ID Task Description
megaflow-flow Optical flow Full training curriculum (default)
megaflow-chairs-things Optical flow Trained on FlyingThings + FlyingChairs only
megaflow-track Point tracking Fine-tuned on Kubric

Quick Start

Installation

pip install git+https://github.com/cvg/megaflow.git

Requirements: Python ≥ 3.12, PyTorch ≥ 2.7, CUDA recommended.

Optical Flow

import torch
from megaflow import MegaFlow

device = "cuda" if torch.cuda.is_available() else "cpu"

# video: float32 tensor [1, T, 3, H, W], pixel values in [0, 255]
video = ...

model = MegaFlow.from_pretrained("megaflow-flow").eval().to(device)

with torch.inference_mode():
    with torch.autocast(device_type=device, dtype=torch.bfloat16):
        # Returns flow for consecutive pairs: (0→1, 1→2, ...)
        # Shape: [1, T-1, 2, H, W]
        flow = model(video, num_reg_refine=8)["flow_preds"][-1]

Point Tracking

import torch
from megaflow import MegaFlow
from megaflow.utils.basic import gridcloud2d

device = "cuda" if torch.cuda.is_available() else "cpu"

# video: float32 tensor [1, T, 3, H, W], pixel values in [0, 255]
video = ...

model = MegaFlow.from_pretrained("megaflow-track").eval().to(device)

with torch.inference_mode():
    with torch.autocast(device_type=device, dtype=torch.bfloat16):
        # Returns dense offsets from frame 0 to each frame t
        flows_e = model.forward_track(video, num_reg_refine=8)["flow_final"]

# Convert offsets to absolute coordinates
grid_xy = gridcloud2d(1, H, W, norm=False, device=device).float()
grid_xy = grid_xy.permute(0, 2, 1).reshape(1, 1, 2, H, W)
tracks = flows_e + grid_xy  # [1, T, 2, H, W]

Demo Scripts

# Clone the repo and run demos
git clone https://github.com/cvg/megaflow.git
cd megaflow

# Optical flow on a video
python demo_flow.py --input assets/longboard.mp4 --output output/longboard_flow.mp4

# Dense point tracking
python demo_track.py --input assets/apple.mp4 --grid_size 8

# Gradio web UI
python demo_gradio.py

Or try the Colab notebook directly in the browser.

Citation

@article{zhang2026megaflow,
  title   = {MegaFlow: Zero-Shot Large Displacement Optical Flow},
  author  = {Zhang, Dingxi and Wang, Fangjinhua and Pollefeys, Marc and Xu, Haofei},
  journal = {arXiv preprint arXiv:2603.25739},
  year    = {2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Kristen-Z/MegaFlow 1

Paper for Kristen-Z/MegaFlow