Zing Forum

Reading

Ollama Spot Launcher: One-click Launch of Local LLM Inference Environment Using Low-cost Temporary GPU Instances

Launch a temporary GPU environment at extremely low cost via AWS EC2 Spot instances, automatically deploy Ollama and Open WebUI, suitable for AI inference scenarios requiring elastic scaling, and support persistent model caching to speed up subsequent launches.

AWSEC2SpotOllamaGPU大模型推理Open WebUI成本优化自动化部署
Published 2026-06-03 09:42Recent activity 2026-06-03 09:54Estimated read 6 min
Ollama Spot Launcher: One-click Launch of Local LLM Inference Environment Using Low-cost Temporary GPU Instances
1

Section 01

Introduction / Main Floor: Ollama Spot Launcher: One-click Launch of Local LLM Inference Environment Using Low-cost Temporary GPU Instances

Launch a temporary GPU environment at extremely low cost via AWS EC2 Spot instances, automatically deploy Ollama and Open WebUI, suitable for AI inference scenarios requiring elastic scaling, and support persistent model caching to speed up subsequent launches.

3

Section 03

Project Overview

Ollama Spot Launcher is a practical AWS infrastructure tool designed to help users quickly launch temporary GPU instances at the lowest cost to run local large language models (LLMs). It fully leverages the price advantage of AWS EC2 Spot instances (usually 70-90% cheaper than on-demand instances), combined with Ollama's model services and Open WebUI's user-friendly interface, providing developers and researchers with a cost-effective elastic AI inference solution.

The core concept of this project is "launch on demand, stop after use": quickly spin up the environment when you need GPU computing power, release the instance after the task is completed, and keep the model cache on the EBS volume so that you can skip repeated downloads next time you launch, achieving readiness in minutes.

4

Section 04

Key Files

File Function
launch_qwen_spot.sh Local launch script, submits Spot/on-demand instance requests and waits for model readiness notifications
ec2_userdata.sh Instance startup script, automatically installs Ollama, pulls models, and starts WebUI
launch.env.example Configuration template; needs to be copied as launch.env and filled with account information
5

Section 05

Complete Workflow

  1. Launch Phase: Local script renders the user data script, injects webhook secret and key pair information, and submits Spot or on-demand instance requests
  2. Instance Initialization: After EC2 starts, it automatically executes the user data script to install Ollama, pull the specified model (default Qwen3-32B), and start Open WebUI
  3. Status Callback: The instance sends status updates to the local via webhook (booting → ollama_ready → model_ready)
  4. Local Readiness: The local script listens to the webhook, and after receiving model_ready, prints the Ollama API address and WebUI address
  5. Usage Phase: Users can access the service via API or browser to perform inference tasks
  6. Automatic Termination: The instance automatically shuts down about 55 minutes after startup by default to avoid unexpected costs
6

Section 06

Spot Instance Price Optimization

The project fully utilizes the bidding mechanism of AWS Spot instances, evaluates the capacity and price of each availability zone through the Spot Placement Score API, and helps users select the optimal launch location. It also supports setting a maximum bid limit to avoid unexpected costs when prices soar.

7

Section 07

Persistent Model Caching

Under default configuration, the EBS volume is retained after the instance is terminated. This means that the pulled LLM files (usually several GB to tens of GB) are saved, and when launching a new instance next time, you only need to mount this volume to use it immediately without repeated downloads, significantly reducing startup time.

8

Section 08

Automatic Availability Zone Selection

With the --auto-az parameter, the script automatically selects the availability zone with the highest Spot Placement Score in the same VPC, eliminating the need to manually configure subnet information for multiple availability zones and simplifying the complexity of cross-zone deployment.