# BioTool: A Large Model Tool Calling Dataset for the Biomedical Domain

> BioTool, an ACL 2026 accepted paper, is open-sourced. It contains 7040 biomedical tool calling data entries, covering 127 biomedical database tools, and significantly improves the question-answering ability of large language models in the biomedical domain.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-04T07:13:03.000Z
- 最近活动: 2026-06-04T07:24:10.590Z
- 热度: 163.8
- 关键词: BioTool, 生物医学, 工具调用, 大语言模型, ACL 2026, NCBI, UniProt, Ensembl, 数据集, 函数调用
- 页面链接: https://www.zingnex.cn/en/forum/thread/biotool-352ceca4
- Canonical: https://www.zingnex.cn/forum/thread/biotool-352ceca4
- Markdown 来源: floors_fallback

---

## [Introduction] BioTool: Open-Source Release of a Large Model Tool Calling Dataset for the Biomedical Domain

BioTool is an ACL 2026 accepted paper and an open-sourced large model tool calling dataset for the biomedical domain. It contains 7040 biomedical tool calling data entries, covering 127 tools from authoritative databases such as NCBI, UniProt, and Ensembl. The dataset aims to improve the tool calling ability and question-answering accuracy of large language models in the biomedical domain. The project provides resources including the dataset, evaluation tools, and fine-tuned models.

## Background: Challenges Faced by LLMs in the Biomedical Domain

Large language models perform well in general domains, but biomedical knowledge is highly specialized, and information is distributed across multiple authoritative databases like NCBI, UniProt, and Ensembl. Traditional LLMs lack the ability to interact with real-time databases, easily generating hallucinations or outdated information. How to enable LLMs to accurately call professional tools has become a key issue to enhance their practicality in the domain.

## Detailed Composition of the BioTool Dataset

BioTool contains 7040 curated data entries (query, function_call, observation triples), covering 127 biomedical tools: NCBI's E-utilities series and BLAST; 14 sub-tools from UniProt (e.g., uniprotkb); and 16 sub-tools from Ensembl (e.g., lookup). It adopts a standard function calling format and provides training sets (5632 entries) and test sets (1408 entries) in the LLaMA-Factory ShareGPT format. Data examples demonstrate the relationship between user questions, tool calls, and return results.

## Evaluation System and Benchmark Models

BioTool establishes three core evaluation metrics: Exact Match (EM, the rate of exact tool call matches), API Success (AS, the proportion of successful calls returning non-error responses), and BioTool Score (comprehensive score). The project has fine-tuned and released the BioTool-finetuned-Qwen3-4B model based on Qwen3-4B, which can be downloaded via Hugging Face.

## Technical Implementation and Usage

BioTool provides Python wrappers for 127 tools, with example code such as calling Ensembl's lookup_by_symbol function. The evaluation process supports closed-source models (e.g., calling GPT-5.1 via OpenRouter) and open-source models (based on LLaMA-Factory), and provides example evaluation scripts (commands for closed-source model evaluation, metric calculation, etc.).

## Application Scenarios and Practical Value

BioTool's application scenarios include: intelligent biomedical question-answering (accurately answering professional questions), research assistant tools (natural language querying of cross-database information), model capability benchmarking, and domain-specific model training (fine-tuning other open-source models).

## Future Development Directions

The project will expand to multi-turn dialogue and multi-step tool calls (multi-hop interaction) in the future, explore optimization directions from supervised fine-tuning to reinforcement learning, and address the distribution shift problem between biomedical data and pre-trained corpora.

## Summary: Significance and Contributions of BioTool

BioTool is the first large-scale tool calling dataset in the biomedical domain, providing 7040 annotated data entries and 127 real tools, which improves the accuracy of LLMs in answering professional questions. The open-sourced resources such as the dataset, evaluation tools, and fine-tuned models will promote the development of biomedical AI.
