ReactVLA: Fast and Lightweight Reactive Robot Manipulation via Improved Mean-Flow Action Generation

1Shanghai Jiao Tong University, 2Universität Hamburg
ReactVLA comparison of VLA models across model size, inference latency, and task success rate.

ReactVLA achieves state-of-the-art success rate while maintaining a lightweight model footprint (0.39B params) and ultra-low inference latency (18.3 ms), enabling real-time reactive robotic control.

Video

Abstract

While diffusion-based VLA policies excel at modeling expressive and multimodal action distributions, their reliance on iterative sampling introduces substantial inference latency, limiting their applicability to reactive closed-loop robot manipulation. To address this challenge, we propose ReactVLA, a low-latency and lightweight Vision-Language-Action framework for reactive robot manipulation. ReactVLA integrates two complementary components: (1) an improved Mean Flow (iMF) action generator that reduces expensive multi-step diffusion sampling to a one-to-two-step generation process, and (2) Attention Residuals (AttnRes), which replace uniform residual accumulation with dynamic depth-wise feature routing to better preserve multimodal representations.

We evaluate ReactVLA on both large-scale simulation benchmarks (LIBERO and RoboIMI) and real-world robotic manipulation tasks. Experimental results show that ReactVLA consistently outperforms similarly sized VLA baselines, including SmolVLA and π0. On challenging precision manipulation tasks, ReactVLA achieves up to a 1.65× improvement in task performance while providing more than a 4× increase in inference speed compared with leading VLA models. These gains reduce policy latency to below 38.6 ms, enabling real-time reactive control for physical robot deployment.

Method Overview

Overview of the ReactVLA framework.

Overview of the ReactVLA framework. Multimodal observations (visual inputs, language instructions, and proprioceptive states) are processed by an Attention Residual Transformer backbone with dynamic depth-wise feature routing. The proposed improved Mean-Flow (iMF) action head directly transports noisy actions toward the target action distribution through one-to-few-step generation, enabling ultra-low latency reactive robot control.

Experimental Results

LIBERO Benchmark

Method # Params Spatial Object Goal Long Avg. Latency (ms)
Diffusion Policy 0.46B 78.3 92.5 68.3 50.5 72.4 178.82
OpenVLA 6.74B 84.7 88.4 79.2 53.7 76.5 115.0
π0 (Paligemma-3B) 5.97B 87.0 63.0 89.0 48.0 71.8 94.3
π0 4.03B 90.0 86.0 95.0 73.0 86.0 93.4
SmolVLA 0.45B 90.0 96.0 92.0 71.0 87.3 74.1
ReactVLA (Ours) 0.39B 93.0 95.0 92.0 72.0 88.0 18.3

RoboIMI Simulation

Method Peg-in-Socket Reward Object Transfer Reward Latency (ms)
ACT 289.60 115.64 4.5
Diffusion Policy 1019.39 319.20 398.0
ReactVLA (Ours) 1513.56 526.22 15.1

Real-World Experiments (Diana 7 Robot)

Method Orange P&P (Success) Block Stack (Success) Latency (ms)
SmolVLA 95% (19/20) 75% (15/20) 82.2
ReactVLA (Ours) 95% (19/20) 90% (18/20) 38.6

Qualitative Results

Qualitative rollout comparison on LIBERO benchmark.

LIBERO Benchmark. ReactVLA generates smoother trajectories and completes tasks faster than SmolVLA within the same physical time.

Qualitative rollout comparison on RoboIMI dual-arm tasks.

RoboIMI Dual-Arm Tasks. ReactVLA coordinates both robotic arms smoothly during peg-in-socket insertion and block handover compared to Diffusion Policy.

Real-world robot experiments on Diana 7.

Real-World Deployment. Orange Pick-and-Place and Block Stack tasks on the Diana 7 robotic arm with real-time reactive control.

BibTeX

@article{guo2025reactvla,
  title={ReactVLA: Fast and Lightweight Reactive Robot Manipulation via Improved Mean-Flow Action Generation},
  author={Guo, Yanzhao and Chen, Wenkai and Zhang, Jianwei},
  journal={arXiv preprint},
  year={2025}
}