Zing Forum

Reading

LucaPCycle: A Dual-Channel Architecture for Phosphorus-Solubilizing Function Prediction Based on Protein Language Models

A dual-channel prediction system combining raw sequences and large protein language models, designed to identify the phosphorus-solubilizing function of protein sequences and subdivide them into 31 specific functional types, applied to large-scale metagenomic data mining.

protein language modelphosphate solubilizationmetagenomicsbioinformaticsdeep learningLucaProtfunctional predictionmicrobiomecold seep
Published 2026-04-28 19:15Recent activity 2026-04-28 19:25Estimated read 6 min
LucaPCycle: A Dual-Channel Architecture for Phosphorus-Solubilizing Function Prediction Based on Protein Language Models
1

Section 01

Introduction to the LucaPCycle Tool

LucaPCycle is a dual-channel architecture for phosphorus-solubilizing function prediction based on protein language models. It combines raw sequences with the representation capabilities of large protein language models to identify the phosphorus-solubilizing function of protein sequences and subdivide them into 31 specific types. It is applied to large-scale metagenomic data mining, addressing the limitations of traditional culture methods and the challenges of metagenomic function identification.

2

Section 02

Research Background and Significance

Phosphorus is an essential element for life, but most phosphorus in soil is insoluble. Phosphorus-solubilizing microorganisms convert phosphorus by secreting organic acids, playing a key role in agriculture and ecological cycles. Traditional screening relies on culture methods, which are time-consuming and have limited coverage. Metagenomic technology generates massive data, but quickly and accurately identifying phosphorus-solubilizing functional proteins has become a new challenge—LucaPCycle was developed to address this.

3

Section 03

Technical Architecture and Methods

The core is a dual-channel design: the raw sequence channel captures local patterns, and the protein language model channel (e.g., LucaProt) extracts deep semantic representations. The model type is LucaProt, with input as seq_matrix and a default truncation length of 4096. It includes an identification model (binary classification) and a fine-grained classification model (31 categories).

4

Section 04

Functions and Application Workflow

It uses a two-stage prediction process: the first stage uses binary classification to determine phosphorus-solubilizing function (dataset: extra_p_2_class_v2, default threshold: 0.2); the second stage performs 31-category classification on positive sequences (dataset: extra_p_31_class_v2, supporting topk output). The 31 types cover different phosphorus-solubilizing mechanisms such as organic acid secretion and phosphatases.

5

Section 05

Large-Scale Data Application Case

Applied to 164 metagenomes and 33 metatranscriptomes (total of over 150 million sequences), with samples from 16 global cold seep sites (depth: 0-68.55 meters, water depth: 860-3005 meters). It predicted over 1.48 million positive sequences and more than 130,000 potentially interesting findings, verified by ECOD, DeepFRI, and CLEAN to ensure reliability.

6

Section 06

Validation Methods and Reliability

Three independent validation methods are used: ECOD domain analysis to check domain composition; DeepFRI v1.0.0 to identify functional residues; CLEAN v1.0.1 for enzyme annotation. Only results passing all three validations are marked as 'verified' to ensure annotation quality.

7

Section 07

Scientific Value and Impact

It facilitates research on the functions of uncultured microorganisms, accelerates the development of biofertilizers, helps understand phosphorus cycle mechanisms in extreme environments, provides a technical route reference for other functional annotation tasks, and demonstrates the application value of AI for Science in the field of microbial function prediction.

8

Section 08

Usage Recommendations and Summary

Applicable scenarios: Metagenomic/transcriptomic phosphorus-solubilizing gene mining, new genome annotation, preliminary filtering for microbial screening, etc. Notes: Combine with experimental validation, adjust thresholds to balance recall and precision, manage GPU memory. Summary: LucaPCycle achieves efficient and accurate phosphorus-solubilizing function prediction, addresses large-scale data challenges, and promotes the application of AI in life sciences.