Zing Forum

Reading

Climate-Agriculture-Commodity Price Analysis Platform: Multi-source Data Integration and Business Intelligence Practice

This is a business analysis course project that integrates climate, agricultural production, and commodity price data. Through ETL processes, PostgreSQL data warehouse, Power BI visualization, and machine learning models, it explores the correlation between climate change and agriculture as well as global market behavior.

business-analyticsclimate-changeagriculturecommodity-pricesETLdata-warehousePower-BImachine-learningPostgreSQL
Published 2026-05-26 11:45Recent activity 2026-05-26 11:58Estimated read 7 min
Climate-Agriculture-Commodity Price Analysis Platform: Multi-source Data Integration and Business Intelligence Practice
1

Section 01

[Introduction] Climate-Agriculture-Commodity Price Analysis Platform: Multi-source Data Integration and Business Intelligence Practice

This is a business analysis course project from the National University of Colombia, developed by Sergio Alejandro Villada Arias and Julián David Aranzazu Velásquez (Source: GitHub, May 26, 2026). The project integrates climate, agricultural production, and commodity price data. Through ETL processes, PostgreSQL data warehouse, Power BI visualization, and machine learning models, it explores the correlation between climate change and agriculture as well as global market behavior. Key tools include Python, KNIME, Power BI, Scikit-learn, etc.

2

Section 02

Project Background and Motivation

Climate change poses severe challenges to agriculture and global food security; extreme weather affects crop yields and commodity prices. However, analyzing these correlations faces data integration difficulties (diverse sources, varying formats, different time granularities). As a final course project, this initiative aims to build a business intelligence system to address this need, providing decision support for policymakers, agricultural enterprises, investors, and researchers.

3

Section 03

Data Sources and Foundation

The project uses three public data sources:

  1. Berkeley Earth: Global historical temperature data (high quality, long time span).
  2. FAOSTAT (Food and Agriculture Organization of the United Nations): Global agricultural production data (yield, area, etc., categorized by country/crop/year).
  3. IMF Primary Commodity Price Database: International commodity price data (covering agricultural products, energy, metals). Analysis time range: 1990-2015 (common time frame for the three data sources, covering key climate and policy changes).
4

Section 04

Technical Architecture and Implementation

The tech stack covers a complete data pipeline:

  • ETL: Python (complex transformation logic) + KNIME (visual workflow), responsible for data cleaning, transformation, and loading into the warehouse.
  • Data Warehouse: PostgreSQL uses a star schema (fact tables + dimension tables) to support efficient multi-dimensional analysis.
  • Visualization: Power BI interactive dashboard for multi-dimensional analysis of climate, agriculture, and commodity prices.
  • Machine Learning Models: OLS regression (linear relationships) + Random Forest (non-linear prediction).
  • AI Assistant: LangChain-based natural language query tool to lower the analysis threshold for non-technical users.
5

Section 05

Key Research Findings

Key insights from the analysis:

  1. Agricultural yield has the closest correlation with climate conditions (crop growth depends on temperature and precipitation).
  2. Some commodities (corn, cocoa) have higher price volatility during heat stress (climate affects supply chains).
  3. The relationship between climate and agricultural production varies by country (due to crop types, technology, irrigation facilities, etc.).
  4. In some models, absolute temperature has stronger predictive value than short-term temperature anomalies.
6

Section 06

Educational Value and Limitations

Educational Value:

  • Students gain complete project experience from data collection to result presentation.
  • Integration of multiple technologies (Python, KNIME, Power BI, etc.) cultivates comprehensive skills.
  • Application of real data (handling quality issues, format conversion, missing values).
  • Open-source nature supports reproducibility and learning.

Limitations:

  • Time range limited to 1990-2015 (lack of recent data).
  • Country-level data (ignores internal regional differences).
  • Machine learning models are relatively simple (deep learning/time series models can be introduced to improve prediction accuracy).
  • Focuses on correlation, not rigorous causal inference.
7

Section 07

Summary and Insights

This project demonstrates the application potential of business analysis in the field of environmental economics. Through multi-source data integration, data warehouse construction, machine learning, and BI visualization, it provides valuable insights for understanding the climate-agriculture-commodity price relationship.

Key Success Factors:

  • Effective integration of multi-source heterogeneous data.
  • Rational selection of tech stack (ETL tools, database, visualization, machine learning).
  • Comprehensive analysis methods (descriptive, predictive, exploratory).
  • Good project organization ensures reproducibility.

For learners, this is an excellent reference case for applying classroom knowledge to real-world problems, sharing knowledge through open-source, and completing complex data analysis tasks via team collaboration.