Section 01
1Cat-vLLM Project Introduction: An AWQ 4-bit Inference Engine Optimized for Tesla V100
1Cat-vLLM is a specialized fork based on vLLM, deeply optimized for Tesla V100 GPUs. It supports AWQ 4-bit quantized inference, is compatible with CUDA 12.8 and modern large models (e.g., Qwen3.5, MoE architectures), and aims to extend the practical lifespan of Tesla V100 GPUs, providing a feasible solution for users of this hardware to run modern large language models.