Zing Forum

Reading

Silicon Sampling Technology in Practice: Feasibility Verification of Using AI to Simulate Voter Opinion Surveys

This article introduces an experimental study from Mackenzie Presbyterian University in Brazil, which verified the effectiveness of Silicon Sampling technology in simulating democratic cognition surveys by comparing traditional random forest models with the Gemini 2.0 Flash large language model.

Silicon Sampling大语言模型Gemini 2.0 Flash随机森林民意调查民主认知机器学习社会科学研究
Published 2026-04-10 09:11Recent activity 2026-04-10 09:15Estimated read 5 min
Silicon Sampling Technology in Practice: Feasibility Verification of Using AI to Simulate Voter Opinion Surveys
1

Section 01

[Introduction] Silicon Sampling Technology: A Feasibility Study on AI Simulation of Voter Surveys

Mackenzie Presbyterian University in Brazil conducted an experiment comparing traditional random forest models with the Gemini 2.0 Flash large language model to verify the effectiveness of Silicon Sampling technology (using AI to simulate responses from real interviewees) in simulating democratic cognition surveys. The results show that while the random forest model has higher accuracy, the large language model demonstrates advantages such as flexibility and interpretability.

2

Section 02

Research Background and Introduction to Silicon Sampling Technology

Silicon Sampling is an emerging method that provides AI models with demographic profiles to simulate responses from interviewees with specific backgrounds, which can reduce the time and resource costs of traditional surveys. This study focuses on Brazilian people's cognitive attitudes towards the democratic system, using the real dataset 04832.SAV, with the goal of verifying whether Gemini 2.0 Flash can accurately simulate responses based on the interviewees' socioeconomic characteristics.

3

Section 03

Experimental Design and Technical Implementation Details

The experiment uses three data sources for comparison: real data (gold standard), random forest model (baseline control group), and Gemini 2.0 Flash (validation object). For technical implementation, Python 3.12 was used on the Google Colab platform, data processing was done with Pandas, the random forest was based on Scikit-Learn, Gemini was called via the Google Generative AI API, and the Pyreadstat library was used to process SPSS-formatted .SAV files.

4

Section 04

Experimental Results and Model Performance Comparison

The random forest model achieved an accuracy of 0.98, while Gemini 2.0 Flash reached 0.90. Random forests excel at handling structured data and automatically capturing feature interactions; Gemini can capture response patterns without fine-tuning, generate natural language responses, and has better flexibility and interpretability.

5

Section 05

Technical Details and Reproducibility Notes

All code and results of the study are publicly available in a GitHub repository, including three core files: projeto_1.ipynb (complete experimental code), resultados_finais_projeto.csv (model prediction results), and grafico_final_projeto1.png (response distribution comparison chart), making it easy for other researchers to reproduce and extend the study.

6

Section 06

Application Prospects and Challenges of Silicon Sampling

Challenges include model bias (which may amplify biases in training data) and cultural context understanding (whether AI can truly grasp the thinking logic of different cultural backgrounds). The prospects are that it can significantly reduce research costs and time, and be used in exploratory research scenarios such as preliminary hypothesis screening and questionnaire design optimization.

7

Section 07

Conclusions and Future Outlook

This study provides empirical support for Silicon Sampling technology. Although traditional machine learning models have higher accuracy, the flexibility and scalability of large language models indicate their broad development potential. In the future, more interdisciplinary studies will explore the boundaries of AI in social sciences, and researchers need to understand the advantages and disadvantages of the tools to apply them rationally.