# LucaPCycle: A Dual-Channel Architecture for Phosphorus-Solubilizing Function Prediction Based on Protein Language Models

> A dual-channel prediction system combining raw sequences and large protein language models, designed to identify the phosphorus-solubilizing function of protein sequences and subdivide them into 31 specific functional types, applied to large-scale metagenomic data mining.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-28T11:15:13.000Z
- 最近活动: 2026-04-28T11:25:17.425Z
- 热度: 161.8
- 关键词: protein language model, phosphate solubilization, metagenomics, bioinformatics, deep learning, LucaProt, functional prediction, microbiome, cold seep
- 页面链接: https://www.zingnex.cn/en/forum/thread/lucapcycle
- Canonical: https://www.zingnex.cn/forum/thread/lucapcycle
- Markdown 来源: floors_fallback

---

## Introduction to the LucaPCycle Tool

LucaPCycle is a dual-channel architecture for phosphorus-solubilizing function prediction based on protein language models. It combines raw sequences with the representation capabilities of large protein language models to identify the phosphorus-solubilizing function of protein sequences and subdivide them into 31 specific types. It is applied to large-scale metagenomic data mining, addressing the limitations of traditional culture methods and the challenges of metagenomic function identification.

## Research Background and Significance

Phosphorus is an essential element for life, but most phosphorus in soil is insoluble. Phosphorus-solubilizing microorganisms convert phosphorus by secreting organic acids, playing a key role in agriculture and ecological cycles. Traditional screening relies on culture methods, which are time-consuming and have limited coverage. Metagenomic technology generates massive data, but quickly and accurately identifying phosphorus-solubilizing functional proteins has become a new challenge—LucaPCycle was developed to address this.

## Technical Architecture and Methods

The core is a dual-channel design: the raw sequence channel captures local patterns, and the protein language model channel (e.g., LucaProt) extracts deep semantic representations. The model type is LucaProt, with input as seq_matrix and a default truncation length of 4096. It includes an identification model (binary classification) and a fine-grained classification model (31 categories).

## Functions and Application Workflow

It uses a two-stage prediction process: the first stage uses binary classification to determine phosphorus-solubilizing function (dataset: extra_p_2_class_v2, default threshold: 0.2); the second stage performs 31-category classification on positive sequences (dataset: extra_p_31_class_v2, supporting topk output). The 31 types cover different phosphorus-solubilizing mechanisms such as organic acid secretion and phosphatases.

## Large-Scale Data Application Case

Applied to 164 metagenomes and 33 metatranscriptomes (total of over 150 million sequences), with samples from 16 global cold seep sites (depth: 0-68.55 meters, water depth: 860-3005 meters). It predicted over 1.48 million positive sequences and more than 130,000 potentially interesting findings, verified by ECOD, DeepFRI, and CLEAN to ensure reliability.

## Validation Methods and Reliability

Three independent validation methods are used: ECOD domain analysis to check domain composition; DeepFRI v1.0.0 to identify functional residues; CLEAN v1.0.1 for enzyme annotation. Only results passing all three validations are marked as 'verified' to ensure annotation quality.

## Scientific Value and Impact

It facilitates research on the functions of uncultured microorganisms, accelerates the development of biofertilizers, helps understand phosphorus cycle mechanisms in extreme environments, provides a technical route reference for other functional annotation tasks, and demonstrates the application value of AI for Science in the field of microbial function prediction.

## Usage Recommendations and Summary

Applicable scenarios: Metagenomic/transcriptomic phosphorus-solubilizing gene mining, new genome annotation, preliminary filtering for microbial screening, etc. Notes: Combine with experimental validation, adjust thresholds to balance recall and precision, manage GPU memory. Summary: LucaPCycle achieves efficient and accurate phosphorus-solubilizing function prediction, addresses large-scale data challenges, and promotes the application of AI in life sciences.
