Section 01
[Main Floor] Deploying Phi-3 Mini on AWS: Guide to Cloud-Native Scalable LLM Inference Service Solution
phi3-cloud-deployment is an open-source cloud-native LLM inference service deployment solution focused on running the Microsoft Phi-3 Mini 3.8B model on AWS with low cost and high scalability. Adopting the Infrastructure as Code (IaC) concept, it achieves automated deployment via Terraform. Core features include: HuggingFace TGI inference framework, AWQ 4-bit quantization optimization (about 2.3GB of VRAM), Server-Sent Events (SSE) streaming responses, ECS auto-scaling (0-3 instances), and zero-cost idle mode, providing developers and enterprises with a production-grade LLM service architecture template.