Zing Forum

Reading

SCAle-BEM: A Multi-Modal Automatic Building Energy Modeling Framework Based on Multi-Agent Large Language Models

SCAle-BEM is an innovative multi-modal framework that leverages vision-language models (VLMs) and large language models (LLMs) to automatically extract information from various visual inputs such as architectural drawings, sketches, and floor plans, and generate building energy models. It ensures model accuracy through self-consistency and cross-consistency verification mechanisms.

建筑能耗建模大语言模型视觉语言模型多模态AI智能体自动化建模可持续建筑能源效率
Published 2026-05-31 03:44Recent activity 2026-05-31 03:52Estimated read 6 min
SCAle-BEM: A Multi-Modal Automatic Building Energy Modeling Framework Based on Multi-Agent Large Language Models
1

Section 01

Introduction to SCAle-BEM Framework: Multi-Modal AI-Driven Automatic Building Energy Modeling

SCAle-BEM is a multi-modal automated building energy modeling framework developed by Gang Jiang and Jianli Chen. It combines vision-language models (VLMs) and large language models (LLMs) to automatically extract information from visual inputs like architectural drawings, sketches, and floor plans, then generate energy models. It ensures accuracy through self-consistency and cross-consistency verification. The framework's code is open-sourced on GitHub (original link: https://github.com/Gangjiang1/SCAle-BEM) and is suitable for scenarios such as architectural design evaluation and existing building analysis.

2

Section 02

Background and Motivation: Pain Points of Traditional BEM and Opportunities from AI Technology

Traditional building energy modeling (BEM) relies on professionals manually inputting a large number of parameters, which is time-consuming and error-prone. Existing automated BEM workflows mostly depend on text inputs and cannot effectively process visual information like drawings and sketches—these visual materials contain rich design intent and geometric information, which are important data sources for BEM. The SCAle-BEM framework aims to solve this problem by integrating multi-modal AI technologies.

3

Section 03

Core Components and Workflow of the Framework

The SCAle-BEM framework consists of four stages:

  1. Visual Interpreter: Uses VLMs to identify building shapes and extract dimensions, supporting three reasoning modes: baseline, self-consistency, and cross-consistency;
  2. Intent Abstractor: Uses LLMs to abstract design intent from visual interpretation results and generate an intermediate representation (IR) including geometry, structure, etc.;
  3. Physical Reviewer: Checks the IR based on physical rules and corrects invalid outputs via a reflection mechanism;
  4. Building Model Generator: Receives the IR to generate building model components, supporting various geometric types, building information, and HVAC systems.
4

Section 04

Technical Highlights: Multi-Modal Support and Consistency Verification

The core innovations of SCAle-BEM include:

  1. Multi-modal input support: Handles visual inputs such as hand-drawn sketches, floor plans, real drawings, and 3D images;
  2. Self-consistency and cross-consistency mechanisms: Reduces uncertainty through multi-inference aggregation and cross-model verification;
  3. Reflection-based LLM verification: The physical reviewer enables the model to self-check compliance with physical laws;
  4. Agent collaboration workflow: Each component has a clear division of labor, forming a complete automated pipeline.
5

Section 05

Application Scenarios and Usage Guide

Application scenarios include:

  • Early-stage architectural design evaluation: Quickly generate energy models from sketches to evaluate schemes;
  • Existing building analysis: Extract information from real drawings/photos for renovation analysis;
  • Energy audit: Automate processing of large datasets to improve efficiency;
  • Research and education: Lower the entry barrier for BEM. Usage: Place input images (PNG/JPG, etc.) into the specified folder; PDFs need to be converted to high-resolution PNGs first.
6

Section 06

Limitations and Notes

As a research prototype, SCAle-BEM has the following limitations:

  1. Model dependency: Results are affected by LLM/VLM selection, image quality, etc.;
  2. Cost considerations: The cross-consistency mode involves many API calls, leading to higher costs;
  3. Experimental nature: The code is suitable for research; full verification is required for production environments.
7

Section 07

Summary and Outlook

SCAle-BEM demonstrates the application potential of multi-modal AI in building energy modeling, providing a feasible path for generating structured models from unstructured visual data. With the advancement of AI technology in the future, such tools will play a greater role in architectural design and energy analysis, helping to optimize building energy performance.