章节 01
VLM-Agent: A New Paradigm for GUI Automation Using VLM+LLM and Go-Python gRPC Architecture
VLM-Agent is a visual automation framework combining visual language models (VLM) and large language models (LLM). It adopts a gRPC architecture with a Go client and Python inference server, offering a new solution to GUI automation challenges faced by traditional methods. This framework allows AI to "see" screens like humans, understand interfaces, and execute operations, breaking free from reliance on underlying interface structures.