Reading

VLA Client Skill: An End-to-End Solution for Direct Control of Robotic Arms by Large Language Models

A ROS2 skill package enabling visual-language-action closed-loop control, allowing LLMs to directly command robotic arms for complex operations while bypassing latency issues from traditional motion planning.

VLA视觉语言动作机器人控制ROS2机械臂端到端学习大语言模型Robonix

Published 2026-06-14 11:44Recent activity 2026-06-14 11:50Estimated read 5 min

VLA Client Skill: An End-to-End Solution for Direct Control of Robotic Arms by Large Language Models

Section 01

VLA Client Skill: Guide to the End-to-End Solution for Direct Robotic Arm Control by LLMs

Project Core

vla_client_rbnx is a ROS2 skill package that implements visual-language-action (VLA) closed-loop control, allowing LLMs to directly call robotic arms to perform complex operations and bypass latency issues from traditional motion planning.

Project Source

Original author/maintainer: lhw2002426
Source platform: GitHub
Release time: June 14, 2026
Original link: https://github.com/lhw2002426/vla_client_rbnx

Section 02

Project Background: Paradigm Shift in Robot Control

Traditional robot operation uses a separated perception-planning-execution architecture. For example, a grasping task requires YOLO recognition → geometric calculation → MoveIt planning, with each step taking 3-5 seconds, which cumulatively affects the efficiency of complex tasks.

The rise of VLA models brings an end-to-end control paradigm: directly generating action sequences from visual inputs and natural language instructions without intermediate representations, making it possible for LLMs to call robot skills.

Section 03

System Architecture and Core Methods

Closed-Loop Control Pipeline

Data flow: Camera topic → observe → VLA server → safety filter → /arm/pos_cmd → piper_ctl → robotic arm (joint state feedback closed loop)

Key Design

Bypass MoveIt: Directly send commands to piper_ctl to achieve a 10Hz control frequency
Multi-source vision: Global camera (scene understanding) + wrist camera (fine operation), uniformly resized to 256x256
Body perception: Joint states (/arm/joint_states_single) + end pose (/arm/end_pose)
Service discovery: Default Atlas dynamically discovers VLA servers, supports direct connection debugging

Section 04

Safety Mechanisms: Guarantees After Bypassing MoveIt

Built-in safety filters:

Joint limits: 6-joint angle limited to ±2.618 radians (±150 degrees)
Rate limit: Single-step joint change ≤0.1 radians
Gripper range: Opening/closing value between 0.0-1.0
Emergency reset: manipulation/reset service calls MoveIt to return to the parking position

Section 05

Comparison and Effect Evidence

Feature	Traditional pick_skill_rbnx	VLA solution vla_client_rbnx
Method	YOLO+Geometry+MoveIt	End-to-end VLA model
Control frequency	3-5 seconds/step	10Hz (100ms/step)
Safety mechanism	MoveIt collision detection	Built-in filter
Applicable scenarios	Simple grasping	Complex language-guided operations

Section 06

LLM Call Interfaces and Deployment Dependencies

LLM Interfaces

robonix/skill/vla/driver: Skill lifecycle management
robonix/skill/vla/execute: Input natural language instructions (e.g., "Put the red block into the blue box") to execute automatically

Deployment Dependencies

ROS2 Humble + rclpy
vla_server_rbnx (GPU inference)
OrbbecSDK_rbnx (camera stream)
piper_ctl_rbnx (execute commands)

Section 07

Technical Significance and Future Outlook

Significance

Paradigm shift: From layered architecture to end-to-end learning,打通ing the channel from natural language to physical actions
Developers: No need for complex kinematics code
Users: Interact with robots using daily language

Challenges

Need to solve VLA model generalization, safety, and hardware migration issues; it is an important step towards general-purpose robot assistants.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

libmlxforge: An Embedded MLX LLM Inference Engine for Apple Silicon

libmlxforge is an embeddable MLX large language model (LLM) inference engine designed specifically for Apple Silicon. It provides a unified C ABI interface, supports calls from Node.js, Swift, and Rust, and features continuous batching, streaming output, JSON-constrained structured output, and embedding vector generation.

Recent activity 2026-06-09 17:23