Zing Forum

Reading

AdoptAI: Combining Causal Inference with Large Models to Predict and Explain Cat Adoption Outcomes

A project that combines the Propensity Score Matching (PSM) causal inference method with HuggingFace large language models to predict and explain cat adoption outcomes in animal shelters.

因果推断倾向得分匹配大语言模型HuggingFace动物救助可解释AI
Published 2026-05-03 22:14Recent activity 2026-05-03 22:24Estimated read 9 min
AdoptAI: Combining Causal Inference with Large Models to Predict and Explain Cat Adoption Outcomes
1

Section 01

AdoptAI Project Guide: Causal Inference + Large Models Empower Cat Adoption Prediction and Explanation

AdoptAI Project Guide

AdoptAI is a project that combines the Propensity Score Matching (PSM) causal inference method with HuggingFace large language models, aiming to predict cat adoption outcomes in animal shelters and explain the reasons behind them. This project addresses the problem that traditional machine learning can only predict probabilities but cannot explain "why", providing shelters with actionable insights to support resource allocation and rescue strategy formulation.

2

Section 02

Project Background: Combining Data Science with Stray Animal Rescue

Project Background: Combining Data Science with Stray Animal Rescue

Millions of stray animals enter shelters worldwide every year, with a significant proportion being cats. Shelter staff need to predict the likelihood of cat adoption and key influencing factors to optimize resource allocation. Traditional machine learning can predict adoption probabilities but cannot explain the reasons. The AdoptAI project attempts to fill this gap using causal inference and combines it with the explanatory power of large language models to provide actionable insights.

3

Section 03

Core Method: Principles of Propensity Score Matching (PSM)

Core Method: Propensity Score Matching (PSM)

The core challenge of causal inference is the inability to observe the outcome of the same object in both treated and untreated states simultaneously. PSM solves this through the following steps:

  1. Calculate propensity scores: Estimate the probability of receiving treatment (e.g., sterilization) based on covariates (age, breed, etc.)
  2. Match similar individuals: Find untreated cats with similar propensity scores for treated cats
  3. Compare outcome differences: The difference in adoption rates between matched samples is attributed to the treatment effect

Mathematical basis: The propensity score is defined as e(X) = P(T=1 | X) (T is the treatment status, X is the covariate); the Average Treatment Effect on the Treated (ATT) is estimated as ATT ≈ (1/N_t) Σ(Y_t - Y_c(matched)).

4

Section 04

Dual Roles of Large Language Models: Feature Engineering and Explanation Generation

Dual Roles of Large Language Models

AdoptAI integrates HuggingFace large models to play two key roles:

Feature Understanding and Engineering

Process unstructured text from shelters (personality descriptions, health notes, etc.):

  • Text embedding: Convert to dense vectors to capture semantics
  • Sentiment analysis: Identify positive and negative tendencies in descriptions
  • Entity extraction: Automatically recognize attributes like breed and color These features are combined with structured features to improve the accuracy of the PSM model.

Natural Language Explanation Generation

Convert the numerical results of causal inference into human-readable explanations. For example: Input treatment (sterilization), effect (+15% adoption probability), and covariate distribution, the LLM generates an explanation: "Data shows that sterilized cats have an average adoption time reduced by 3 days..." The explanation is based on comprehensive reasoning of data patterns and domain knowledge.

5

Section 05

Research Findings: Key Factors Affecting Cat Adoption

Research Findings and Insights

Key factors affecting cat adoption:

Modifiable Features

  • Sterilization status: Faster adoption (eliminates concerns about breeding costs, etc.)
  • Vaccination: Complete records increase adoption probability
  • Socialization training: Cats that can use litter boxes are more popular

Non-modifiable Features

  • Age: Kittens (2-6 months) are adopted fastest; senior cats (10+ years) face greater challenges
  • Breed: Ragdolls, British Shorthairs, etc., are in high demand
  • Color: Black cats have longer average waiting times (black cat effect)

Heterogeneity of Causal Effects

PSM reveals effect differences: For example, the positive effect of sterilization is stronger for adult cats than for kittens, and stronger for stray cats than for abandoned cats.

6

Section 06

Project Limitations and Ethical Considerations

Limitations and Ethical Considerations

Methodological Limitations

  • Unobserved confounding factors: Unmeasured variables can lead to estimation bias
  • SUTVA assumption: When shelter resources are limited, the adoption of one cat may affect another
  • Matching quality: Insufficient overlap in propensity scores can lead to sample loss

Ethical Considerations

  • Risk of prediction misuse: High adoption probability should not be a reason for euthanizing cats with low probability
  • Fairness: Whether the algorithm has biases against certain breeds/colors
  • Transparency: Staff and adopters have the right to understand the basis for decisions.
7

Section 07

Implications for AI Applications and Project Conclusion

Implications for AI Applications and Conclusion

Implications for AI Applications

  1. Causality is better than correlation: Predictive models know "what is", while causal inference knows "why" and "what if"
  2. Value of interpretability: Black-box models are unacceptable in life welfare decisions
  3. Interdisciplinary collaboration: Data scientists need to collaborate with veterinarians and animal behaviorists

Conclusion

AdoptAI applies cutting-edge causal inference and large model technology to stray animal rescue, demonstrating the potential of AI in the field of social responsibility. It reminds us that the value of AI lies in helping us understand the complex world and make better decisions, providing a reference for data scientists and animal welfare workers.