Zing Forum

Reading

Zero-Shot Decision Tree Generation: Enabling Large Language Models to Directly Output Interpretable Classifiers

This article introduces an innovative study combining large language models (LLMs) with decision trees. Through zero-shot prompting, LLMs can directly generate classification decision logic, enabling the construction of interpretable machine learning models without training data.

大语言模型决策树零样本学习可解释AIKDD论文分类器生成开源项目
Published 2026-04-20 08:13Recent activity 2026-04-20 08:19Estimated read 7 min
Zero-Shot Decision Tree Generation: Enabling Large Language Models to Directly Output Interpretable Classifiers
1

Section 01

Introduction: Zero-Shot Decision Tree Generation—Enabling LLMs to Directly Output Interpretable Classifiers

This article presents an innovative study that combines large language models (LLMs) with decision trees. Using zero-shot prompting, LLMs can directly generate classification decision logic, allowing the construction of interpretable machine learning models without training data. This study reproduces a KDD paper and explores the zero-shot decision tree induction paradigm, providing new ideas for rapid modeling in data-scarce scenarios. It is worth referencing for researchers and practitioners interested in interpretable AI.

2

Section 02

Background: The Conflict Between Interpretability and Data Dependence

In the field of machine learning, deep neural networks have excellent performance but suffer from the 'black box' problem, which restricts their application in scenarios requiring high interpretability such as finance and healthcare. Traditional decision trees are transparent and interpretable, but their construction relies on large amounts of labeled data and complex computations. When facing new domains or data-scarce scenarios, significant human effort is needed for feature engineering and rule design. Core question: Can we leverage the knowledge reserve and reasoning ability of LLMs to directly generate classification decision logic from natural language descriptions?

3

Section 03

Methodology: Technical Path for Zero-Shot Decision Tree Generation

A GitHub open-source project reproduces the KDD paper titled 'Oh LLM, I'm Asking Thee, Please Give Me a Decision Tree' and explores the zero-shot decision tree induction paradigm. Core idea: Provide dataset feature descriptions to open-source LLMs (such as GPT-OSS 20B, Qwen3 14B, etc.), use prompts to let the model generate decision tree judgment logic, and convert it into a runnable Python classification function. Implementation process: Prepare feature descriptions → Design prompts → Generate decision logic → Package into Python functions. It also supports decision tree embedding extraction for downstream tasks.

4

Section 04

Evaluation: Effect Verification Across Multiple Datasets

The project was tested on classic classification datasets such as bankruptcy prediction, horse colic diagnosis, and credit scoring. Evaluation methods include decision tree induction and embedding extraction. Metrics cover classification accuracy, F1 score, as well as decision tree complexity (number of nodes, depth) and interpretability. It was found that different models have varying performances: some models tend to produce complex tree structures, while others prefer concise rules.

5

Section 05

Significance and Applications: Potential of Meta-Learners and Solutions for Data-Scarce Scenarios

The study demonstrates the potential of LLMs as 'meta-learners' that can directly generate structured machine learning models. It provides new ideas for rapid modeling in data-scarce scenarios—users only need to describe the problem features to obtain a classifier. Practical applications are suitable for the prototype verification phase: domain experts can build interpretable rules without the support of a data science team, which can be used for proof-of-concept or preliminary decision support. The generated decision trees can also serve as a starting point for complex models or a basis for training data generation.

6

Section 06

Limitations and Outlook: Possible Paths to Improve Generation Quality

Current method limitations: The quality of generation is limited by the LLM's knowledge cutoff date and domain coverage; it may lack sufficient background knowledge for highly specialized or emerging fields. The performance of zero-shot methods is difficult to compare with dedicated trained models. Future directions: Combine few-shot examples to improve generation quality, develop human-machine collaboration fine-tuning mechanisms, and explore hybrid architectures of decision trees and neural networks.

7

Section 07

Conclusion: An Important Step Toward Interpretable Intelligence

The combination of LLMs and decision trees is an important step for AI toward 'interpretable intelligence'. The model not only provides prediction results but also shows the basis for judgment, improving the efficiency and credibility of human-machine collaboration. This open-source project provides a concrete implementation path for this vision and is worth the attention of researchers and practitioners interested in interpretable AI.