# Forensic Ancestry Inference: A Benchmark Study Based on SNP Panels and Machine Learning

> This study explores how to use five ancestry-informative SNP markers and machine learning algorithms to accurately infer continental-level ancestry from degraded DNA samples, providing proof of concept for forensic applications.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-10T22:16:00.000Z
- 最近活动: 2026-06-10T22:20:15.739Z
- 热度: 150.9
- 关键词: 法医遗传学, 祖先推断, SNP, 机器学习, 群体遗传学, 千人基因组, DNA降解, 分类算法
- 页面链接: https://www.zingnex.cn/en/forum/thread/snp
- Canonical: https://www.zingnex.cn/forum/thread/snp
- Markdown 来源: floors_fallback

---

## Introduction: Proof of Concept Study on Forensic Ancestry Inference

This study explores the use of five ancestry-informative SNP markers and machine learning algorithms to accurately infer continental-level ancestry from degraded DNA samples. It validates the feasibility of a minimal SNP panel based on 1000 Genomes Project data, providing proof of concept for forensic applications.

## Background: DNA Challenges in Forensic Science and Solutions with AISNPs

In forensic practice, DNA samples from crime scenes often face challenges of low quantity and degradation. Traditional STR analysis has strong individual identification capabilities but limited ancestry information. Ancestry-informative SNPs (AISNPs) show significant frequency differences among continental populations and can infer ancestry with a small number of markers. This study explores the feasibility of minimizing the SNP panel.

## Study Design: Data Sources and Five-Marker AISNP Panel

Data were obtained from 2504 individuals in Phase 3 of the 1000 Genomes Project, divided into five continental populations: AFR (Africa), AMR (Admixed Americas), EAS (East Asia), EUR (Europe), and SAS (South Asia). Five AISNP markers validated by population genetics were carefully selected, including rs2814778 (African ancestry), rs3827760 (East Asian ancestry), etc.

## Analysis Methods: Genotype Analysis and Machine Learning Classification

The frequency of each SNP in the population was calculated (e.g., rs2814778 is Africa-specific). PCA with the five markers captured 80.3% of genetic variation, showing clear population clustering. Four machine learning models were evaluated, with SVM achieving the highest accuracy (91.2%), while the accuracy for the Admixed Americas population was lower.

## Key Findings: Feature Importance and Degradation Robustness

Random forest feature importance showed that rs2814778 was the most informative. Progressive SNP deletion experiments indicated that classification performance remained robust under moderate deletion, highlighting the forensic value of high-information markers.

## Limitations and Future Research Directions

Limitations: Only five markers were evaluated, and subcontinental population structure was not addressed. Future plans: Expand to the Kidd55 panel, evaluate ensemble models, simulate DNA degradation scenarios, validate with independent datasets, etc.

## Practical Significance: Implications for Forensic Applications

It was validated that a minimal SNP panel can reproduce continental population structure. When samples are limited or degraded, a small number of carefully selected markers can still provide ancestry clues, offering additional support for case investigations.
