OpenClaw + Ollama Local Offline Deployment Complete Guide (2026)
Want a fully offline AI assistant that doesnβt depend on any external APIs? The OpenClaw + Ollama combo gives you:
- β Zero monthly costs β No Anthropic/OpenAI API key needed
- β Complete data privacy β All conversations processed locally, no third-party servers
- β Offline capable β Works without internet, perfect for air-gapped environments
- β Model flexibility β Switch between LLaMA, Qwen, DeepSeek, Gemma at will
This guide walks you through building this system from scratch, suitable for users with basic Linux knowledge.
I. Environment Requirements
Hardware Requirements
| Model Size | Minimum VRAM | Recommended VRAM | Recommended CPU | RAM |
|---|---|---|---|---|
| 7B models (e.g. Qwen2.5:7b) | 8GB | 12GB+ | 8 cores | 16GB |
| 14B-34B models | 16GB | 24GB+ | 16 cores | 32GB |
| 70B+ models | 48GB | 80GB+ | 32 cores | 64GB+ |
π‘ No GPU? Small models (7B) can run on CPU, but 5-10x slower.
System Requirements
- Ubuntu 22.04+ / Debian 12+ / macOS 12+
- Docker (optional, recommended for isolation)
- At least 50GB available disk space (model files are large)
II. Installing Ollama
Method 1: One-line Install Script (Recommended)
# Linux/macOS universal
curl -fsSL https://ollama.com/install.sh | sh
# Verify installation
ollama --version
Method 2: Docker Deployment
# Create data directory
mkdir -p ~/.ollama
# Start Ollama service (GPU version)
docker run -d \
--name ollama \
--gpus all \
-v ~/.ollama:/root/.ollama \
-p 11434:11434 \
ollama/ollama
# CPU version: remove --gpus all
docker run -d \
--name ollama \
-v ~/.ollama:/root/.ollama \
-p 11434:11434 \
ollama/ollama
Verify Service
curl http://localhost:11434/api/tags
JSON response = successful installation.
III. Download and Configure Models
Recommended Model Catalog
| Model | Parameters | Strengths | Download Command |
|---|---|---|---|
| qwen2.5:7b | 7B | Chinese dialogue, code | ollama pull qwen2.5:7b |
| llama3.3:70b | 70B | Reasoning, long context | ollama pull llama3.3:70b |
| deepseek-r1:7b | 7B | Reasoning chains | ollama pull deepseek-r1:7b |
| codellama:13b | 13B | Code generation | ollama pull codellama:13b |
| gemma2:9b | 9B | General dialogue | ollama pull gemma2:9b |
Download Models
# Example: Download Qwen2.5 7B (great for Chinese)
ollama pull qwen2.5:7b
# Multiple models (choose as needed)
ollama pull deepseek-r1:7b
ollama pull codellama:13b
First download takes time (~4-5GB for 7B models), please be patient.
Test Model
# Interactive test
ollama run qwen2.5:7b
# When prompt appears, type:
Hello, introduce yourself
# Exit: Ctrl+D or type /bye
IV. OpenClaw Integration
1. Edit OpenClaw Config
vim ~/.openclaw/openclaw.json
2. Add Ollama as Provider
Add to providers section:
providers:
# Keep existing providers (e.g. anthropic)...
# Add Ollama
- id: ollama
kind: openai-compatible # Ollama is OpenAI API compatible
baseUrl: http://localhost:11434/v1
apiKey: ollama # Any string works, Ollama doesn't validate
3. Configure Default Model
models:
default: ollama/qwen2.5:7b # Use local model as default
fallbacks:
- ollama/deepseek-r1:7b # Fallback option
- anthropic/claude-sonnet-4-5 # Cloud fallback if Anthropic configured
4. Complete Config Example
gateway:
mode: local
port: 18789
providers:
- id: ollama
kind: openai-compatible
baseUrl: http://localhost:11434/v1
apiKey: ollama
models:
default: ollama/qwen2.5:7b
fallbacks:
- ollama/deepseek-r1:7b
plugins:
allow:
- web_search
- web_fetch
- exec
channels: []
5. Restart OpenClaw
openclaw gateway restart
# Check logs to confirm connection
openclaw gateway logs | grep ollama
V. Verification & Testing
CLI Test
# OpenClaw CLI interactive mode
openclaw chat
# Send test message
Generate a Python script to read CSV files
API Test
curl -X POST http://localhost:18789/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/qwen2.5:7b",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 100
}'
VI. Troubleshooting
1. connect ECONNREFUSED 127.0.0.1:11434
Cause: Ollama service not running.
Fix:
# Check if Ollama is running
ps aux | grep ollama
# Start manually (non-Docker)
ollama serve &
# Docker method
docker start ollama
2. Slow Inference Speed
Cause: CPU mode or insufficient VRAM.
Optimization:
# Check GPU usage
nvidia-smi
# Switch to smaller model
ollama pull qwen2.5:1.5b # Only 1.5B params, CPU-friendly
# Update OpenClaw config
models:
default: ollama/qwen2.5:1.5b
3. Poor Response Quality
Cause: Sub-7B models have limited capabilities.
Solutions:
- Use 14B+ models (e.g.
qwen2.5:14b) - Configure cloud fallback for complex tasks:
models:
default: ollama/qwen2.5:7b
fallbacks:
- anthropic/claude-sonnet-4-5 # Cloud fallback for complex tasks
4. model not found
Cause: Model not downloaded or name mismatch.
Fix:
# List downloaded models
ollama list
# Verify exact name (case-sensitive)
# Config model name must match exactly what ollama list shows
VII. Advanced Configuration
Multi-Model Load Balancing
If you have multiple servers:
providers:
- id: ollama-gpu1
kind: openai-compatible
baseUrl: http://192.168.1.100:11434/v1
apiKey: ollama
- id: ollama-gpu2
kind: openai-compatible
baseUrl: http://192.168.1.101:11434/v1
apiKey: ollama
models:
default: ollama-gpu1/qwen2.5:7b
fallbacks:
- ollama-gpu2/qwen2.5:7b
Custom Model Parameters
Create a Modelfile to adjust temperature, top_p, etc.:
cat > ~/qwen-creative.Modelfile << 'EOF'
FROM qwen2.5:7b
PARAMETER temperature 0.9
PARAMETER top_p 0.95
PARAMETER num_ctx 8192
EOF
ollama create qwen-creative -f ~/qwen-creative.Modelfile
Use in OpenClaw:
models:
default: ollama/qwen-creative
VIII. Production Environment Tips
1. Systemd Service
# Create Ollama service
sudo tee /etc/systemd/system/ollama.service << 'EOF'
[Unit]
Description=Ollama Service
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/ollama serve
Restart=always
User=ollama
Environment="OLLAMA_HOST=0.0.0.0:11434"
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
2. Performance Monitoring
# Real-time GPU monitoring
watch -n 1 nvidia-smi
# View Ollama logs
journalctl -u ollama -f
3. Backup & Restore
# Backup downloaded models
tar czf ollama-models-backup.tar.gz ~/.ollama/models
# Restore
tar xzf ollama-models-backup.tar.gz -C ~/
IX. Cloud Server Deployment
While this is βlocalβ deployment, cloud servers offer better hardware and network:
- π₯ Tencent Cloud GPU Instances β T4/V100 GPUs, pay-as-you-go for testing
- π Vultr High-Frequency CPU β Can run small models without GPU, hourly billing
- π§ DigitalOcean β $200 free credit, great for long-term deployment
π‘ Tip: GPU instances are expensive. For personal use, 7B-14B models + high-frequency CPU is often sufficient.
Summary
| Solution | Cost | Performance | Privacy | Use Case |
|---|---|---|---|---|
| OpenClaw + Ollama | One-time hardware | Medium-High (depends on hardware) | βββββ | Internal networks, sensitive data |
| OpenClaw + Anthropic | Pay-per-token | βββββ | βββ | High-quality dialogue, complex reasoning |
| Hybrid | Low + on-demand | ββββ | ββββ | Local first, cloud fallback |
Local deployment is ideal for long-term heavy use or sensitive data scenarios. If just experimenting, start with cloud APIs first before investing in hardware.
Related Guides:
This guide is continuously updated. Questions? Join the OpenClaw community.