Section 01
[Introduction] Key Findings on LLMs' Ability to Understand HMSC Formal Semantics
The study evaluates Gemini-3, GPT-5.4, and Qwen-3.6 on their understanding of the formal semantics of HMSC (the foundation of UML sequence diagrams). It finds an overall accuracy of only 52%, with particularly weak performance on complex semantic reasoning tasks such as abstract composition and trace analysis, revealing that current LLMs still have a rather limited understanding of strict formal semantics.