Section 01
[Introduction] viiwork: The Load Balancing Tool That Turns Old AMD GPUs Into LLM Inference Clusters
viiwork is an open-source LLM inference load balancer designed for old gfx906 architecture GPUs like the AMD Radeon VII. It can form a Mesh cluster with multiple GPUs equipped with 16GB HBM2 memory and provide an OpenAI-compatible API interface. It taps into the potential of legacy hardware, offering a low-cost inference solution for users on a budget. Key features include intelligent model recommendation, real-time electricity cost monitoring, and Pipeline chained inference, among others.