Run MoltBot with Local Models via Ollama

Ollama lets you run AI models locally, giving you complete privacy and zero API costs. This guide shows how to set up MoltBot with Ollama.

Why Local Models?

Complete Privacy: Data never leaves your machine
No API Costs: Free after initial setup
Offline Capable: Works without internet
Full Control: Choose any open-source model
Fast: No network latency

Prerequisites

8GB+ RAM (16GB+ recommended)
macOS, Windows, or Linux
GPU optional but recommended

Installing Ollama

macOS

brew install ollama

Or download from ollama.com.

Windows

Download the installer from ollama.com.

Linux

curl -fsSL https://ollama.com/install.sh | sh

Starting Ollama

Start the Ollama service:

ollama serve

Ollama runs on http://localhost:11434 by default.

Downloading Models

Recommended Models

# Best quality (requires 16GB+ RAM)
ollama pull llama3.1:70b

# Good balance (requires 8GB+ RAM)
ollama pull llama3.1

# Fast and light (requires 4GB+ RAM)
ollama pull llama3.2:3b

# Coding specialist
ollama pull codellama

# Uncensored
ollama pull dolphin-mistral

Check Downloaded Models

ollama list

Configuring MoltBot

Basic Setup

moltbot config set aiProvider ollama
moltbot config set ollamaModel llama3.1

Full Configuration

Edit ~/.moltbot/config.json:

{
  "aiProvider": "ollama",
  "ollama": {
    "baseUrl": "http://localhost:11434",
    "model": "llama3.1",
    "options": {
      "temperature": 0.7,
      "numCtx": 8192,
      "numGpu": 1
    }
  }
}

Configuration Options

Option	Description	Default
`baseUrl`	Ollama server URL	`http://localhost:11434`
`model`	Model to use	Required
`temperature`	Response creativity (0-1)	0.7
`numCtx`	Context window size	4096
`numGpu`	Number of GPUs to use	0 (CPU)

Model Selection Guide

For General Use

{
  "ollama": {
    "model": "llama3.1"
  }
}

Llama 3.1 offers the best balance of quality and performance.

For Coding

{
  "ollama": {
    "model": "codellama:34b"
  }
}

For Fast Responses

{
  "ollama": {
    "model": "llama3.2:3b",
    "options": {
      "numCtx": 4096
    }
  }
}

For Complex Analysis

{
  "ollama": {
    "model": "llama3.1:70b",
    "options": {
      "numCtx": 16384,
      "numGpu": 1
    }
  }
}

GPU Acceleration

NVIDIA

Ollama automatically uses NVIDIA GPUs. Verify:

ollama run llama3.1 --verbose
# Look for "GPU" in output

Apple Silicon

Native Metal support on M1/M2/M4:

{
  "ollama": {
    "options": {
      "numGpu": 1
    }
  }
}

AMD (ROCm)

Install ROCm first, then Ollama will detect it automatically.

Memory Optimization

Reducing Memory Usage

{
  "ollama": {
    "model": "llama3.1",
    "options": {
      "numCtx": 4096,
      "numBatch": 512,
      "numThread": 4
    }
  }
}

Loading Models

Pre-load models on startup:

# Keep model in memory
ollama run llama3.1 &

Unloading Models

Free memory when not in use:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1",
  "keep_alive": 0
}'

Running Ollama Remotely

Server Setup

On your server:

# Allow external connections
OLLAMA_HOST=0.0.0.0 ollama serve

Client Configuration

{
  "ollama": {
    "baseUrl": "http://192.168.1.100:11434",
    "model": "llama3.1"
  }
}

With Authentication

Use a reverse proxy like nginx with basic auth:

location /api/ {
    auth_basic "Ollama";
    auth_basic_user_file /etc/nginx/.htpasswd;
    proxy_pass http://localhost:11434;
}

Hybrid Mode

Use Ollama for simple tasks, cloud API for complex ones:

{
  "aiProvider": "hybrid",
  "hybrid": {
    "default": "ollama",
    "fallback": "anthropic"
  },
  "ollama": {
    "model": "llama3.1"
  },
  "anthropic": {
    "model": "claude-3-sonnet"
  },
  "skills": {
    "quickChat": {
      "aiProvider": "ollama"
    },
    "complexAnalysis": {
      "aiProvider": "anthropic"
    }
  }
}

Performance Tuning

For Speed

{
  "ollama": {
    "model": "llama3.2:3b",
    "options": {
      "numCtx": 2048,
      "numBatch": 1024,
      "numGpu": 1
    }
  }
}

For Quality

{
  "ollama": {
    "model": "llama3.1:70b",
    "options": {
      "numCtx": 8192,
      "temperature": 0.5,
      "repeatPenalty": 1.1
    }
  }
}

Troubleshooting

Model Not Found

# Pull the model first
ollama pull llama3.1

Connection Refused

# Start Ollama service
ollama serve

Out of Memory

Use a smaller model
Reduce numCtx
Close other applications

Slow Performance

Enable GPU: numGpu: 1
Use a smaller model
Reduce context size

Custom Models

Using Custom Modelfiles

Create Modelfile:

FROM llama3.1
SYSTEM "You are MoltBot, a helpful AI assistant."
PARAMETER temperature 0.7

Build and use:

ollama create moltbot-custom -f Modelfile

Configure MoltBot:

{
  "ollama": {
    "model": "moltbot-custom"
  }
}

Enjoy private, local AI with MoltBot and Ollama! Need help? Join our Discord community.