Zing Forum

Reading

STARS: A New Method for Aligning Large Language Models During Inference via Segment-wise Rejection Sampling

STARS proposes a new method to align the outputs of large language models (LLMs) during inference without additional training. Using a segment-wise rejection sampling strategy, it significantly improves the quality and safety of model outputs while maintaining generation efficiency.

大语言模型对齐拒绝采样推理时对齐AI安全奖励模型分段生成
Published 2026-05-18 22:15Recent activity 2026-05-18 22:17Estimated read 7 min
STARS: A New Method for Aligning Large Language Models During Inference via Segment-wise Rejection Sampling
1

Section 01

Introduction to STARS: A New Breakthrough in Inference-Time Alignment

STARS proposes a new method to align the outputs of large language models (LLMs) during inference without additional training. Using a segment-wise rejection sampling strategy, it significantly improves the quality and safety of model outputs while maintaining generation efficiency, filling the gap between Vanilla decoding (no alignment) and Best-of-N (high overhead).

2

Section 02

Background and Challenges: Current Status and Issues of LLM Alignment

The alignment problem of large language models (LLMs) is a core issue in AI safety research. Traditional alignment methods rely on Supervised Fine-Tuning (SFT) or Reinforcement Learning from Human Feedback (RLHF), which require significant computational resources; even well-trained models may still produce unexpected outputs during inference. Inference-Time Alignment, an area of recent exploration, is flexible and low-cost to deploy, but how to achieve effective alignment while maintaining efficiency remains an open challenge.

3

Section 03

Core Idea and Technical Mechanism of STARS

The core of STARS (Synchronous Token Alignment for Robust Supervision) is a segment-wise rejection sampling strategy: dynamically evaluating and rejecting token sequence segments that do not meet requirements during generation. The technical process includes: 1. Segment generation (generating by semantically complete units); 2. Real-time evaluation (scoring by reward model); 3. Dynamic decision-making (deciding to accept or regenerate based on thresholds); 4. Adaptive adjustment (adjusting parameters according to historical acceptance rates). Key hyperparameters include segment_size, max_attempts, alpha/beta, and reward_threshold, which can adapt to various scenarios.

4

Section 04

Experimental Validation: Performance Across Multiple Datasets

Evaluation results of STARS on three datasets: 1. HarmfulQA safety test: significantly reduces the proportion of harmful content without seriously impairing the usefulness of answers; 2. HH-RLHF helpfulness evaluation: comparable in quality to Best-of-N but with higher computational efficiency; 3. IMDB sentiment control experiment: accurately guides movie reviews with specified emotional tendencies, verifying the effectiveness of fine-grained attribute control.

5

Section 05

Comparison with Existing Methods: Positioning and Advantages

Method Training Requirement Inference Overhead Alignment Granularity Application Scenario
Vanilla Decoding None Lowest None Fast Generation
Best-of-N None High Sentence-level Quality Priority
STARS None Medium Segment-level Balance Quality and Efficiency
STARS fills the gap between Vanilla decoding and Best-of-N, making it suitable for scenarios that require real-time alignment but cannot afford high computational costs.
6

Section 06

Practical Application Value: Safety, Personalization, and Cost-Effectiveness

  1. Safe Deployment: Provides additional security protection for public-facing AI systems, blocking potential harmful outputs; 2. Personalized Control: Adjusting the reward model and parameters enables output style customization, allowing the same model to serve different user groups; 3. Cost-Effectiveness: Low marginal cost, suitable for resource-constrained scenarios that require high-quality outputs.
7

Section 07

Limitations and Future Research Directions

Limitations: 1. Dependence on the quality of the reward model (may have biases or blind spots); 2. Segment evaluation introduces certain inference delays; 3. Hyperparameter tuning increases deployment complexity. Future directions: Adaptive segment length adjustment, multi-reward model integration, and combination with model distillation technology.

8

Section 08

Summary and Open-Source Information

STARS provides an elegant and practical solution for inference-time alignment, improving alignment performance without modifying the model or increasing training costs. It is a promising direction worth exploring in the field of AI safety and alignment. The open-source implementation of this project has been released on GitHub, including complete code, configuration files, and evaluation scripts.