Zing Forum

Reading

ReLocator: A Deep Neural Network-Based Tool for Genetic Geolocation

ReLocator is an open-source tool that uses deep neural networks to predict the geographic origin of samples from genotype data, achieving unprecedented geolocation accuracy in studies of parasites, mosquitoes, and human populations.

deep learninggenetic geolocationneural networkspopulation geneticsbioinformaticstensorflowmachine learninggenomics
Published 2026-05-12 04:25Recent activity 2026-05-12 04:30Estimated read 8 min
ReLocator: A Deep Neural Network-Based Tool for Genetic Geolocation
1

Section 01

Introduction / Main Post: ReLocator: A Deep Neural Network-Based Tool for Genetic Geolocation

ReLocator is an open-source tool that uses deep neural networks to predict the geographic origin of samples from genotype data, achieving unprecedented geolocation accuracy in studies of parasites, mosquitoes, and human populations.

2

Section 02

Research Background: The Hidden Link Between Genes and Geography

In nature, most organisms tend to mate and reproduce with nearby conspecifics. This spatial preference forms a unique spatial autocorrelation pattern in genetic data. Each organism's genome is a collage of genetic material from its recent ancestors, who usually lived in geographically close areas.

This association between genes and geography makes it possible to predict the geographic location of unknown samples by comparing their genetic data with that of known-source samples. This technology has important applications in multiple fields: in forensics, it can be used to trace the origin of smuggled ivory; in epidemiology, it helps track the transmission path of pathogens; in ecological research, it can reveal the migration history and population structure of species.

3

Section 03

Limitations of Traditional Methods

Before the advent of ReLocator, methods for estimating sample geographic locations were mainly divided into two categories, both with obvious flaws.

The first category is unsupervised genotype clustering or dimensionality reduction techniques. These methods jointly analyze genetic data from known and unknown source samples, then assign unknown samples to the locations of known samples that belong to the same genotype cluster or principal component space region. However, this approach requires an additional mapping step to convert genotype clusters into geographic coordinates, which may produce unreasonable results if the unknown sample is a hybrid or from an unsampled reference population.

The second category is explicit model-based methods, such as SPASIBA and SCAT. These methods use a two-step process: first, estimate a smooth frequency map of each allele's spatial variation based on the genotypes of individuals at known locations; then predict the position of a new sample by maximizing the likelihood of observing a specific combination of alleles at a given location. These methods usually assume that allele frequencies follow a specific form of function (e.g., Gaussian function), have high computational costs, and impose strict assumptions on the model.

4

Section 04

Core Innovations of ReLocator

ReLocator adopts a new supervised deep learning strategy that directly learns the mapping relationship between genes and geographic coordinates from raw genotype data, without the need for preset complex population genetics models.

5

Section 05

Deep Neural Network Architecture

ReLocator uses a deep fully connected neural network to approximate the complex functional relationship between genotypes and geographic locations. Unlike traditional methods, it directly takes unphased genotype data as input and outputs predicted geographic coordinates (longitude and latitude) through multiple layers of nonlinear transformations. During training, Euclidean distance is used as the loss function, allowing the model to learn to minimize the straight-line distance between the predicted and true positions.

6

Section 06

Genomic Window Analysis

A key innovation of ReLocator lies in its computational efficiency. Thanks to the efficient implementation of deep learning, ReLocator can perform window-based genomic analysis—dividing the entire genome into multiple overlapping or non-overlapping windows and predicting positions for each window separately. This approach offers two important advantages:

First, by comparing prediction results from different windows, the uncertainty of predictions can be quantified. Due to recombination, different regions of the genome may reflect different ancestral origins, and window analysis can capture this genome-wide ancestral mosaic pattern.

Second, window analysis reveals the mixed ancestral patterns of samples. For individuals with complex ancestral histories, different genomic windows may point to different geographic regions, which provides rich information for understanding population history and individual migration.

7

Section 07

Performance

According to research published in the journal eLife, ReLocator has demonstrated excellent performance on multiple real-world datasets:

  • Plasmodium falciparum: Median test error is only 16.9 km
  • Anopheles mosquitoes: Median test error as low as 5.7 km
  • Global human populations: Median test error is 85 km

In simulated data, ReLocator can infer sample positions within 4.1 generations of dispersal distance, and runs at least an order of magnitude faster than existing model-based methods.

8

Section 08

Technical Features and Functions

ReLocator not only has high accuracy but also has rich technical features to meet the needs of different research scenarios: