# SCAle-BEM: A Multi-Modal Automatic Building Energy Modeling Framework Based on Multi-Agent Large Language Models

> SCAle-BEM is an innovative multi-modal framework that leverages vision-language models (VLMs) and large language models (LLMs) to automatically extract information from various visual inputs such as architectural drawings, sketches, and floor plans, and generate building energy models. It ensures model accuracy through self-consistency and cross-consistency verification mechanisms.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-30T19:44:47.000Z
- 最近活动: 2026-05-30T19:52:30.575Z
- 热度: 150.9
- 关键词: 建筑能耗建模, 大语言模型, 视觉语言模型, 多模态AI, 智能体, 自动化建模, 可持续建筑, 能源效率
- 页面链接: https://www.zingnex.cn/en/forum/thread/scale-bem
- Canonical: https://www.zingnex.cn/forum/thread/scale-bem
- Markdown 来源: floors_fallback

---

## Introduction to SCAle-BEM Framework: Multi-Modal AI-Driven Automatic Building Energy Modeling

SCAle-BEM is a multi-modal automated building energy modeling framework developed by Gang Jiang and Jianli Chen. It combines vision-language models (VLMs) and large language models (LLMs) to automatically extract information from visual inputs like architectural drawings, sketches, and floor plans, then generate energy models. It ensures accuracy through self-consistency and cross-consistency verification. The framework's code is open-sourced on GitHub (original link: https://github.com/Gangjiang1/SCAle-BEM) and is suitable for scenarios such as architectural design evaluation and existing building analysis.

## Background and Motivation: Pain Points of Traditional BEM and Opportunities from AI Technology

Traditional building energy modeling (BEM) relies on professionals manually inputting a large number of parameters, which is time-consuming and error-prone. Existing automated BEM workflows mostly depend on text inputs and cannot effectively process visual information like drawings and sketches—these visual materials contain rich design intent and geometric information, which are important data sources for BEM. The SCAle-BEM framework aims to solve this problem by integrating multi-modal AI technologies.

## Core Components and Workflow of the Framework

The SCAle-BEM framework consists of four stages:
1. Visual Interpreter: Uses VLMs to identify building shapes and extract dimensions, supporting three reasoning modes: baseline, self-consistency, and cross-consistency;
2. Intent Abstractor: Uses LLMs to abstract design intent from visual interpretation results and generate an intermediate representation (IR) including geometry, structure, etc.;
3. Physical Reviewer: Checks the IR based on physical rules and corrects invalid outputs via a reflection mechanism;
4. Building Model Generator: Receives the IR to generate building model components, supporting various geometric types, building information, and HVAC systems.

## Technical Highlights: Multi-Modal Support and Consistency Verification

The core innovations of SCAle-BEM include:
1. Multi-modal input support: Handles visual inputs such as hand-drawn sketches, floor plans, real drawings, and 3D images;
2. Self-consistency and cross-consistency mechanisms: Reduces uncertainty through multi-inference aggregation and cross-model verification;
3. Reflection-based LLM verification: The physical reviewer enables the model to self-check compliance with physical laws;
4. Agent collaboration workflow: Each component has a clear division of labor, forming a complete automated pipeline.

## Application Scenarios and Usage Guide

Application scenarios include:
- Early-stage architectural design evaluation: Quickly generate energy models from sketches to evaluate schemes;
- Existing building analysis: Extract information from real drawings/photos for renovation analysis;
- Energy audit: Automate processing of large datasets to improve efficiency;
- Research and education: Lower the entry barrier for BEM.
Usage: Place input images (PNG/JPG, etc.) into the specified folder; PDFs need to be converted to high-resolution PNGs first.

## Limitations and Notes

As a research prototype, SCAle-BEM has the following limitations:
1. Model dependency: Results are affected by LLM/VLM selection, image quality, etc.;
2. Cost considerations: The cross-consistency mode involves many API calls, leading to higher costs;
3. Experimental nature: The code is suitable for research; full verification is required for production environments.

## Summary and Outlook

SCAle-BEM demonstrates the application potential of multi-modal AI in building energy modeling, providing a feasible path for generating structured models from unstructured visual data. With the advancement of AI technology in the future, such tools will play a greater role in architectural design and energy analysis, helping to optimize building energy performance.
