Section 01
[Introduction] Can Large Language Models Predict Electricity Demand? A Comparison of 14 Models Reveals True Capability Boundaries
This study systematically compares the performance of statistical models, machine learning, deep learning, and large language models (14 configurations in total) on the electricity load forecasting task of the Belgian grid, aiming to reveal the capability boundaries of LLMs in time-series prediction. Using nearly 10 years of Belgian grid data, key findings include: Time-LLM (an architecture adapting GPT-2 via a reprogramming layer) outperforms traditional XGBoost and LSTM; directly prompting GPT-4o for prediction yields poor results; the ensemble model (XGB+LSTM+Time-LLM) achieves the best performance.