Section 01
Main Floor | Introduction to Research on Synthetic Tabular Data Generation Based on LLM Fine-Tuning
The master's thesis project at ITMO University explores methods and strategies for generating high-quality synthetic tabular data using fine-tuning techniques for large language models (LLMs). It aims to address the bottleneck of data scarcity in the field of machine learning, as well as issues such as privacy regulation constraints, high annotation costs, etc., in real data acquisition. The core idea is to serialize tabular data into text formats (e.g., JSON, CSV), leverage the powerful sequence modeling capabilities of LLMs to transfer to structured data generation tasks, and explore effective fine-tuning strategies and multi-dimensional evaluation frameworks.