Section 01
【Introduction】Used RTX 2080 Ti Dual Cards Running 27B Large Model: Core Summary of vLLM Definitive Edition Practical Guide
Original Author & Source
- Original Author/Maintainer: weicj
- Source Platform: GitHub
- Original Title: vLLM-2080Ti-Definitive: The definitive vLLM runtime for dual RTX 2080 Ti 22GB + NVLink
- Original Link: https://github.com/weicj/vLLM-2080Ti-Definitive
- Release Date: June 3, 2026
Core Points Dual modified 22GB RTX 2080 Ti graphics cards connected via NVLink, paired with the vLLM 2080 Ti Definitive Edition runtime, can achieve equivalent or even stronger local large model inference performance at half the price of a used RTX 3090 Ti (approximately $550). It supports models like Qwen3.6 27B and Gemma4 31B, with a single-request decoding speed of over 100 tokens/second and natively supports a 262K context length.