Reading

Artificial Intelligence-Based Solar Power Generation Prediction: A Comparative Study of Photovoltaic Forecasting Models in Loja, Ecuador

This article introduces a research project on photovoltaic power generation prediction in Loja, Ecuador. It compares the prediction performance of four artificial intelligence models—Random Forest, XGBoost, LSTM, and GRU—at different time resolutions, providing technical references for addressing the challenges of climate variability and atmospheric noise in high-altitude areas.

solar energy predictionphotovoltaic forecastingLSTMGRUXGBoostRandom Foresttime series forecastingmachine learningdeep learningEcuador

Published 2026-06-10 02:16Recent activity 2026-06-10 02:18Estimated read 15 min

Artificial Intelligence-Based Solar Power Generation Prediction: A Comparative Study of Photovoltaic Forecasting Models in Loja, Ecuador

Section 01

Introduction/Main Floor

This study addresses the problem of photovoltaic power generation prediction in the high-altitude area of Loja, Ecuador. It compares the prediction performance of four artificial intelligence models—Random Forest, XGBoost, LSTM, and GRU—at two time resolutions: 5-minute (high-frequency) and 1-hour (hourly). The research aims to tackle the challenges of climate variability and atmospheric noise in high-altitude regions, providing technical references for solar energy development in this area and similar geographical conditions. The project is accompanied by open-source code covering the complete experimental process, facilitating reproduction and expansion.

Section 02

Research Background and Significance

With the acceleration of global energy transition, solar energy, as an important component of clean and renewable energy, accurate prediction of its power generation is crucial for grid dispatching, energy management, and electricity market transactions. However, photovoltaic power generation is affected by various factors such as weather conditions, cloud changes, and temperature fluctuations, showing obvious intermittency and uncertainty characteristics.

Especially in high-altitude areas, climate variability is stronger and atmospheric noise is more complex, bringing additional challenges to photovoltaic prediction. The Loja region of Ecuador is located in the Andes Mountains at an altitude of approximately 2100 meters, with unique plateau climate characteristics, making it an ideal scenario for studying photovoltaic prediction algorithms. Accurate prediction of photovoltaic power generation in this region not only helps stabilize the local power grid operation but also provides technical references for solar energy development under similar geographical conditions.

Section 03

Research Methods and Technical Implementation

Core Model Selection

The project selected two types of prediction models with different characteristics:

Traditional Machine Learning Models:

Random Forest: An ensemble learning-based decision tree algorithm, good at handling non-linear relationships and feature interactions
XGBoost (Extreme Gradient Boosting): An efficient gradient boosting framework, performing excellently in structured data prediction tasks

Deep Learning Models:

LSTM (Long Short-Term Memory Network): A recurrent neural network variant specifically designed to handle sequence data, capable of capturing temporal dependencies
GRU (Gated Recurrent Unit): A simplified version of LSTM, reducing the number of parameters and computational overhead while maintaining similar performance

Data Collection and Processing

The study used meteorological data provided by the Climate Observatory of the Technical University of Loja (UTPL). Due to third-party data sharing agreements, the original data is not publicly available, but researchers can apply for access through official channels from the UTPL Climate Observatory.

Data preprocessing includes:

Time series alignment and missing value handling
Meteorological feature engineering (temperature, humidity, radiation intensity, etc.)
Data standardization and normalization
Training/validation/test set division

Dual Time Resolution Experimental Design

To comprehensively evaluate model performance, the study designed two time resolution schemes:

High-Frequency Data (5-minute resolution):

Captures rapidly changing meteorological conditions
Suitable for real-time prediction and grid frequency regulation
Larger data volume, higher requirements for model training efficiency

Hourly Data (1-hour resolution):

Smooths short-term fluctuations, focuses on trend changes
Suitable for day-ahead dispatching and energy planning
Lower computational overhead, suitable for resource-constrained scenarios

Model Architecture and Training Strategy

Random Forest and XGBoost

These two tree-based models adopted similar feature engineering strategies, converting the time series prediction problem into a supervised learning problem. By constructing feature vectors through sliding windows, the models learn the mapping relationship between historical meteorological data and future power generation.

Hyperparameter tuning includes:

Number and depth of trees
Learning rate and regularization parameters
Feature sampling ratio

LSTM and GRU Networks

Recurrent neural networks directly process time series inputs without explicitly constructing lag features. The network structure includes:

Input layer receiving multivariate time series
Hidden layer capturing temporal dependency patterns
Fully connected output layer generating predicted values

Training configuration:

Optimizer: Adam
Loss function: Mean Squared Error (MSE)
Early stopping mechanism to prevent overfitting
Learning rate decay strategy

Section 04

Experimental Results and Model Comparative Analysis

The study evaluated the prediction performance of each model through multi-dimensional metrics, including Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R²).

Key Findings

Impact of Time Resolution: High-frequency data (5 minutes) contains richer information but also introduces more noise. Models need to balance between capturing rapid changes and filtering noise. Hourly data shows more stable trends, suitable for medium and long-term prediction tasks.

Differences Between Model Types:

Deep learning models (LSTM, GRU) have advantages in capturing complex temporal patterns, especially in high-frequency data scenarios
Traditional machine learning models train faster, have lower computational resource requirements, and perform robustly when data volume is limited
GRU, as a lightweight alternative to LSTM, achieves similar prediction accuracy in most scenarios

Special Challenges in High-Altitude Areas: The Loja region has high atmospheric transparency but fast weather changes, and cloud movement significantly affects radiation intensity. Models need to effectively integrate multi-source meteorological information to achieve ideal results.

Section 05

Research Conclusions and Practical Application Value

Summary

This photovoltaic prediction study for the Loja region of Ecuador provides valuable practical experience for solar energy prediction in high-altitude areas through a systematic comparison of the performance of four artificial intelligence models at dual time resolutions. The study shows that there is no absolutely optimal model; choosing the appropriate algorithm requires comprehensive consideration of multiple factors such as data characteristics, prediction horizon, and computational resources.

The project's open-source code repository has a clear structure, covering the complete process from data preprocessing to model evaluation, providing directly referable implementation examples for researchers and engineers in related fields. As global solar installed capacity continues to grow, the progress of such prediction technologies will lay a solid foundation for the large-scale application of clean energy.

Practical Application Value

The results of this study have practical value in multiple aspects:

For Grid Operators: Accurate photovoltaic prediction helps optimize dispatching plans, reduce reserve capacity requirements, and lower operating costs.

For Solar Power Plants: Prediction results can guide operation and maintenance decisions, such as equipment maintenance scheduling and energy storage system charging/discharging strategies.

For Academic Research: The open-source code implementation provides a benchmark for subsequent research, facilitating reproduction and expansion by other researchers.

For Similar Regions: The research methods can be transferred to other high-altitude regions with abundant solar resources, such as Tibet and the Bolivian Plateau.

Section 06

Technical Insights and Future Research Directions

Hybrid Model Architecture

Future research can explore combining traditional machine learning with deep learning, such as using XGBoost to extract features before inputting to LSTM, or adopting ensemble learning strategies to fuse multi-model prediction results.

External Data Fusion

Introducing external data sources such as satellite cloud images and numerical weather forecasts is expected to further improve prediction accuracy, especially for sudden weather events.

Uncertainty Quantification

In addition to point prediction, providing prediction intervals or probability distributions is more valuable for practical decision-making. Methods like Bayesian neural networks or quantile regression are worth trying.

Edge Deployment Optimization

For resource-constrained edge devices, model compression and quantization techniques can enable prediction systems to be directly deployed on power plant sites, reducing network dependency.