Section 01
FIRST: Federated Inference Resource Scheduling Toolkit for Scientific Computing (Introduction)
FIRST (Federated Inference Resource Scheduling Toolkit) is an open-source inference gateway developed by Argonne National Laboratory. It aims to address the core challenge faced by research institutions: leveraging high-performance computing (HPC) infrastructure for large language model (LLM) inference while protecting data privacy. This toolkit provides secure and scalable inference services via OpenAI-compatible APIs, supporting both batch and interactive modes. It uses a federated architecture to enable cross-cluster resource scheduling, offering a private AI inference solution for the scientific computing domain.