Zing Forum

Reading

Business Intelligence Analysis of the Impact of Generative AI on Students' Academic Performance and Mental Health

A business intelligence project based on data from approximately 50,000 college students, using a star schema architecture, analyzes the relationship between generative AI usage and academic performance, knowledge retention, and emotional health via Amazon Athena and DBeaver.

generative AIeducation analyticsbusiness intelligenceAmazon Athenastudent performancemental healthdata warehouseETLhigher education
Published 2026-06-08 11:13Recent activity 2026-06-08 11:24Estimated read 6 min
Business Intelligence Analysis of the Impact of Generative AI on Students' Academic Performance and Mental Health
1

Section 01

Business Intelligence Analysis of Generative AI's Impact on College Students' Academic Performance and Mental Health (Main Floor)

This project conducts business intelligence analysis based on data from approximately 50,000 college students to explore the relationship between generative AI usage and academic performance, knowledge retention, and emotional health. It adopts a star schema architecture and uses tools like Amazon Athena and DBeaver for analysis, aiming to identify AI usage patterns that can improve academic performance without harming students' health. The original project is maintained by LizzyRuiz, source from GitHub (link: https://github.com/LizzyRuiz/ai-student-impact-bi), published on 2026-06-08.

2

Section 02

Research Background and Core Questions

The widespread application of generative AI tools has changed college students' learning methods, but it raises concerns: over-reliance, decreased knowledge retention, weakened traditional learning habits, impact on emotional health, and increased risk of academic burnout. Core question: How does generative AI usage affect college students' academic performance, knowledge retention ability, and emotional health? The project uses an open dataset containing 50,000 college student records and employs business intelligence methods to find optimal AI usage patterns.

3

Section 03

Data Architecture and Tech Stack

Adopts a star schema data warehouse architecture (1 fact table +4 dimension tables) to optimize query performance and logical consistency. The tech stack is cloud-native: Amazon S3 for storage, Amazon Athena for serverless querying, DBeaver as the client; ETL is implemented using Python Pandas and SQLAlchemy, including data cleaning, transformation, KPI calculation, and validation steps. Advantages: No server management required, pay-per-query, adapts to fluctuating workloads.

4

Section 04

Dataset Composition and Key Metrics

The dataset contains 16 fields covering background (ID, major, academic year), academics (GPA, skill retention score), AI usage (weekly duration, scenarios, prompt engineering level, tool diversity, paid subscription), and mental health (traditional learning duration, perceived AI dependence, exam anxiety, burnout risk). Core KPIs: GPA improvement rate, AI usage duration, skill retention score, AI dependence level, burnout risk level, reflecting a multi-dimensional perspective.

5

Section 05

Analysis Dimensions and Research Hypotheses

Four analysis directions: 1. Relationship between AI usage duration and academic performance (explore optimal interval); 2. Relationship between AI dependence and knowledge retention (test impact of over-reliance);3. Impact of AI usage on burnout/anxiety (focus on psychological costs);4. Comparison of AI usage patterns across different majors (identify disciplinary differences). It is hypothesized that the effect of generative AI is non-linear: moderate usage improves efficiency, while over-reliance weakens critical thinking, providing a data basis for policy formulation.

6

Section 06

Practical Significance and Application Scenarios

The project provides colleges with a framework for evaluating the impact of AI tools, helping to formulate reasonable usage policies. It can be extended to scenarios such as online learning platform analysis and educational game effect evaluation. Technically, it demonstrates a cloud-native lightweight BI solution (Athena+S3) with excellent cost-effectiveness, suitable for institutions with limited resources; the code structure is clear and can serve as a starting point for similar analyses.

7

Section 07

Methodological Insights and Future Directions

Correlation analysis is used to identify association patterns, but causality cannot be established; in the future, randomized controlled trials can be combined to verify strategy effectiveness, or longitudinal tracking studies can be conducted to explore long-term impacts. The open dataset (50,000 records) has statistical power, supports reproduction and expansion, and reflects the value of open science.