Complete Guide: How to Set Up Ollama with UI and Run DeepSeek (or Any Model) Offline on Windows & Linux

Complete Guide: How to Set Up Ollama with UI and Run DeepSeek (or Any Model) Offline on Windows & Linux

Introduction

Welcome to this comprehensive guide on setting up Ollama with a user interface on your Windows or Linux machine! In today’s AI-driven world, having access to powerful language models locally on your computer offers unprecedented privacy, control, and cost savings. Whether you’re a developer, researcher, or AI enthusiast, this guide will walk you through everything from basic installation to advanced configurations.

Why Run LLMs Locally?

  • Complete Privacy: Your data never leaves your computer
  • Zero API Costs: No subscription fees or usage charges
  • Full Control: Customize models, parameters, and behavior
  • Offline Access: Work without internet connectivity
  • Educational Value: Learn how AI models work under the hood

What You’ll Need

System Requirements

Minimum (for smaller models like Mistral 7B, DeepSeek Coder 1.3B):

  • CPU: 4+ cores (Intel i5/Ryzen 5 or better)
  • RAM: 16GB
  • Storage: 10GB free space
  • GPU (optional): 6GB+ VRAM for acceleration

Recommended (for larger models like Llama 3 70B, DeepSeek Coder 33B):

  • CPU: 8+ cores
  • RAM: 32GB+
  • Storage: 50GB+ free space
  • GPU: NVIDIA with 12GB+ VRAM (RTX 3060+ or better)

Software Prerequisites

  • Windows 10/11 or Linux (Ubuntu 22.04+, Fedora 38+, or similar)
  • Git (for some installations)
  • Basic command-line knowledge
  • Administrator/sudo privileges

Part 1: Installing Ollama

Windows Installation

Method 1: Using the Official Installer (Recommended)

  1. Download the Installer:
  • Visit ollama.com
  • Click “Download” and select Windows
  • Save the .exe file to your computer
  1. Install Ollama:
  • Double-click the downloaded installer
  • Follow the setup wizard
  • The installer will:
    • Add Ollama to your PATH
    • Install the Ollama service
    • Create necessary directories
  1. Verify Installation:
    Open PowerShell or Command Prompt and run:
   ollama --version

You should see version information (e.g., ollama version 0.1.xx).

Method 2: Using Winget (Alternative)

winget install Ollama.Ollama

Method 3: Manual Installation (Advanced)

  1. Download the latest Windows release from GitHub:
   # Download using PowerShell
   Invoke-WebRequest -Uri "https://github.com/ollama/ollama/releases/latest/download/ollama-windows-amd64.exe" -OutFile "ollama.exe"

   # Move to a directory in your PATH
   Move-Item .\ollama.exe "C:\Program Files\Ollama\"

   # Add to PATH (if not done automatically)
   [Environment]::SetEnvironmentVariable("Path", $env:Path + ";C:\Program Files\Ollama\", "User")

Linux Installation

Ubuntu/Debian-based Systems

# Method 1: Using the official installer (Recommended)
curl -fsSL https://ollama.com/install.sh | sh

# Method 2: Manual installation
# Download the latest Linux release
curl -L https://ollama.com/download/ollama-linux-amd64 -o ollama

# Make it executable
chmod +x ollama

# Move to system directory
sudo mv ollama /usr/local/bin/

# Create a systemd service (for auto-start)
sudo tee /etc/systemd/system/ollama.service <<EOF
[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=$USER
Group=$USER
Restart=always
RestartSec=3
Environment="HOME=$HOME"

[Install]
WantedBy=default.target
EOF

# Enable and start the service
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama

Fedora/RHEL-based Systems

# Install using the script
curl -fsSL https://ollama.com/install.sh | sh

# Or use the manual method above

Arch Linux/Manjaro

# Using AUR helper (yay)
yay -S ollama-bin

# Or from AUR directly
git clone https://aur.archlinux.org/ollama-bin.git
cd ollama-bin
makepkg -si

# Start the service
sudo systemctl enable --now ollama

Verify Linux Installation

# Check if Ollama is running
systemctl status ollama

# Or run directly
ollama --version

Part 2: Installing Ollama Web UI

Option 1: Open WebUI (Ollama WebUI) – Recommended

Windows Installation:

  1. Install Node.js and npm:
  • Download from nodejs.org
  • Install the LTS version
  • Verify installation:
    powershell node --version npm --version
  1. Clone and Install Open WebUI:
   # Clone the repository
   git clone https://github.com/open-webui/open-webui.git
   cd open-webui

   # Install dependencies
   npm install

   # Build the application
   npm run build

   # Start the server (development mode)
   npm run dev

   # Or for production
   npm start
  1. Access the UI:
  • Open your browser
  • Navigate to http://localhost:3000
  • First-time setup will ask for Ollama API URL (default: http://localhost:11434)

Linux Installation:

# 1. Install Node.js and npm
# Ubuntu/Debian
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt-get install -y nodejs

# Fedora
sudo dnf install nodejs npm

# Arch
sudo pacman -S nodejs npm

# 2. Clone and install Open WebUI
git clone https://github.com/open-webui/open-webui.git
cd open-webui

# 3. Install dependencies
npm install

# 4. Build (production)
npm run build

# 5. Create a systemd service for auto-start
sudo tee /etc/systemd/system/open-webui.service <<EOF
[Unit]
Description=Open WebUI Service
After=network.target ollama.service

[Service]
Type=simple
User=$USER
WorkingDirectory=$(pwd)
ExecStart=/usr/bin/npm start
Restart=on-failure
Environment="NODE_ENV=production"

[Install]
WantedBy=multi-user.target
EOF

# 6. Enable and start
sudo systemctl daemon-reload
sudo systemctl enable open-webui
sudo systemctl start open-webui

Option 2: Continue (Alternative WebUI)

Docker Installation (Cross-Platform):

# Pull and run the Continue container
docker run -d \
  --name continue \
  -p 3000:3000 \
  -v continue-data:/app/data \
  --restart unless-stopped \
  continueai/continue:latest

Native Installation:

# Clone the repository
git clone https://github.com/continuedev/continue.git
cd continue

# Follow platform-specific build instructions
# See their README for detailed setup

Option 3: Ollama WebUI (Simple Version)

Windows:

# Install Python if not present
winget install Python.Python.3.11

# Install the web UI
pip install ollama-webui

# Run it
ollama-webui

Linux:

# Install Python and pip
sudo apt install python3 python3-pip  # Ubuntu/Debian
sudo dnf install python3 python3-pip  # Fedora

# Install the web UI
pip3 install ollama-webui

# Run it
ollama-webui

Part 3: Installing and Running Models

Understanding Model Formats

Ollama uses models in the GGUF format, which offers:

  • Efficient CPU inference
  • GPU acceleration support
  • Quantization options (Q4_0, Q8_0, etc.)
  • Smaller file sizes with minimal accuracy loss

Basic Ollama Commands

# List available models
ollama list

# Pull a model (download)
ollama pull <model-name>

# Run a model
ollama run <model-name>

# Remove a model
ollama rm <model-name>

# Copy/duplicate a model
ollama cp <source> <destination>

Installing DeepSeek Models

DeepSeek Coder (Programming Focused)

# DeepSeek Coder 6.7B (Good balance of performance/size)
ollama pull deepseek-coder:6.7b

# DeepSeek Coder 33B (More capable, requires more RAM)
ollama pull deepseek-coder:33b

# DeepSeek Coder 1.3B (Lightweight, fast)
ollama pull deepseek-coder:1.3b

# Quantized versions (smaller, faster)
ollama pull deepseek-coder:6.7b-q4_0

DeepSeek LLM (General Purpose)

# DeepSeek LLM 7B
ollama pull deepseek-llm:7b

# DeepSeek LLM 67B (Very capable, needs significant resources)
ollama pull deepseek-llm:67b

Installing Other Popular Models

Meta Models (Llama Series)

# Llama 3 8B
ollama pull llama3:8b

# Llama 3 70B
ollama pull llama3:70b

# Llama 2 7B
ollama pull llama2:7b

Mistral AI Models

# Mistral 7B
ollama pull mistral:7b

# Mixtral 8x7B (MoE model)
ollama pull mixtral:8x7b

# Codestral (coding specialist)
ollama pull codestral:latest

Code-Specific Models

# CodeLlama
ollama pull codellama:7b
ollama pull codellama:13b
ollama pull codellama:34b

# WizardCoder
ollama pull wizardcoder:latest

Small/Experimental Models

# Phi-2 (Microsoft's small model)
ollama pull phi:latest

# TinyLlama
ollama pull tinyllama:latest

# Neural Chat
ollama pull neural-chat:latest

Running Models via Command Line

Interactive Chat:

# Start an interactive session
ollama run deepseek-coder:6.7b

# Example conversation:
# >>> Write a Python function to calculate fibonacci numbers
# >>> Explain the time complexity
# >>> Now write it in Rust

Single Prompt:

# One-off prompt
ollama run llama3:8b "Explain quantum computing in simple terms"

# With specific parameters
ollama run deepseek-coder:6.7b --temperature 0.7 --num-predict 500 "Write a REST API in Go"

Using the API:

# Generate text via API
curl http://localhost:11434/api/generate -d '{
  "model": "deepseek-coder:6.7b",
  "prompt": "Write a binary search algorithm in Python",
  "stream": false
}'

# Chat completion
curl http://localhost:11434/api/chat -d '{
  "model": "llama3:8b",
  "messages": [
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi! How can I help you?"},
    {"role": "user", "content": "Explain recursion"}
  ]
}'

Part 4: Advanced Configuration & Optimization

GPU Acceleration Setup

NVIDIA GPU (Windows & Linux):

  1. Install CUDA Toolkit:
  1. Configure Ollama for GPU:
   # Set environment variable
   # Windows (PowerShell):
   $env:OLLAMA_GPU="0"  # Use first GPU

   # Windows (CMD):
   set OLLAMA_GPU=0

   # Linux:
   export OLLAMA_GPU="0"

   # For multiple GPUs
   export OLLAMA_GPUS="0,1"  # Use GPU 0 and 1
  1. Verify GPU Usage:
   # Check if Ollama is using GPU
   ollama ps

   # Monitor GPU usage
   # Windows: nvidia-smi (in Command Prompt)
   # Linux: nvidia-smi or nvtop

AMD GPU (ROCm – Linux Only):

# Install ROCm (Ubuntu)
sudo apt update
sudo apt install rocm-dev

# Configure Ollama
export OLLAMA_GPU="amd"
export HSA_OVERRIDE_GFX_VERSION=10.3.0  # Adjust for your GPU

Intel GPU (Windows & Linux):

# OneAPI installation
# Windows: Install Intel oneAPI Base Toolkit
# Linux: 
sudo apt install intel-oneapi-mkl

# Configure for Intel Arc
export OLLAMA_GPU="intel"

Memory and Performance Optimization

Model Quantization:

# Pull quantized versions (smaller, faster)
ollama pull deepseek-coder:6.7b-q4_0  # 4-bit quantization
ollama pull deepseek-coder:6.7b-q8_0  # 8-bit quantization
ollama pull deepseek-coder:6.7b-q2_K  # 2-bit quantization (experimental)

# Convert existing models (advanced)
# Requires llama.cpp or similar tools

Windows-Specific Optimizations:

  1. Adjust Virtual Memory:
  • Control Panel → System → Advanced system settings
  • Performance Settings → Advanced → Virtual Memory
  • Set to at least 1.5x your RAM
  1. Power Settings:
  • Set to “High Performance”
  • Disable USB selective suspend
  • Set PCI Express to “Maximum Performance”
  1. Graphics Settings:
  • Add Ollama to High-Performance GPU list
  • Settings → System → Display → Graphics settings

Linux-Specific Optimizations:

# 1. Increase swap space (if low RAM)
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# Add to /etc/fstab for persistence:
# /swapfile none swap sw 0 0

# 2. Set CPU governor to performance
sudo apt install cpufrequtils  # Ubuntu/Debian
sudo cpupower frequency-set -g performance

# 3. Optimize filesystem
# Add to /etc/fstab for your data drive:
# noatime,nodiratime,data=writeback

# 4. Increase limits
sudo tee -a /etc/security/limits.conf <<EOF
* soft memlock unlimited
* hard memlock unlimited
* soft nofile 65535
* hard nofile 65535
EOF

Creating Custom Model Modifications

Create a Modelfile:

# Modelfile for custom DeepSeek configuration
FROM deepseek-coder:6.7b

# System prompt
SYSTEM """You are DeepSeek Coder Pro, an expert programming assistant.
Always provide code with explanations.
Include time and space complexity analysis.
Format code with proper indentation and comments."""

# Parameters
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 8192
PARAMETER num_predict 2048

# Template
TEMPLATE """{{ .System }}
User: {{ .Prompt }}
Assistant: {{ .Response }}"""

Build and Use Custom Model:

# Create the model
ollama create deepseek-coder-pro -f ./Modelfile

# Run it
ollama run deepseek-coder-pro

# Push to Ollama library (optional)
ollama push username/deepseek-coder-pro

Part 5: Using the Web UI

Open WebUI Features

Initial Setup:

  1. Access the Interface:
  • Open browser to http://localhost:3000 (or your configured port)
  • Create an admin account
  1. Connect to Ollama:
  • Go to Settings → Ollama API
  • Enter: http://localhost:11434
  • Test connection
  1. Configure Models:
  • Go to Models section
  • Pull new models directly from UI
  • Set default models for chat

Using the Chat Interface:

  1. Start a New Chat:
  • Click “New Chat”
  • Select model (e.g., deepseek-coder:6.7b)
  • Choose parameters (temperature, max tokens)
  1. Advanced Features:
  • Code Execution: Some UIs can run code in sandbox
  • File Upload: Upload documents for analysis
  • Web Search: Enable internet search (requires configuration)
  • Plugins: Add functionality via plugins
  1. Chat Management:
  • Save conversations
  • Export chats (JSON, Markdown, PDF)
  • Search through conversation history

Model Management in UI:

  • View downloaded models
  • Delete unused models
  • Monitor resource usage
  • Set model priorities

API Integration Examples

Python Client:

import requests
import json

class OllamaClient:
    def __init__(self, base_url="http://localhost:11434"):
        self.base_url = base_url

    def generate(self, model, prompt, **kwargs):
        payload = {
            "model": model,
            "prompt": prompt,
            "stream": False,
            **kwargs
        }
        response = requests.post(f"{self.base_url}/api/generate", 
                               json=payload)
        return response.json()

    def chat(self, model, messages, **kwargs):
        payload = {
            "model": model,
            "messages": messages,
            "stream": False,
            **kwargs
        }
        response = requests.post(f"{self.base_url}/api/chat",
                               json=payload)
        return response.json()

# Usage
client = OllamaClient()
response = client.generate("deepseek-coder:6.7b", 
                         "Write a quicksort implementation in Python")
print(response["response"])

JavaScript/Node.js Client:

const axios = require('axios');

class OllamaClient {
    constructor(baseURL = 'http://localhost:11434') {
        this.client = axios.create({ baseURL });
    }

    async generate(model, prompt, options = {}) {
        const response = await this.client.post('/api/generate', {
            model,
            prompt,
            stream: false,
            ...options
        });
        return response.data;
    }

    async chat(model, messages, options = {}) {
        const response = await this.client.post('/api/chat', {
            model,
            messages,
            stream: false,
            ...options
        });
        return response.data;
    }
}

// Usage
const ollama = new OllamaClient();
ollama.generate('llama3:8b', 'Explain blockchain')
    .then(data => console.log(data.response));

Part 6: Troubleshooting & Common Issues

Installation Problems

Windows:

Problem: "ollama is not recognized as an internal or external command"
Solution:
1. Check if Ollama is installed: Look for "Ollama" in Start Menu
2. Add to PATH manually:
   - System Properties → Advanced → Environment Variables
   - Add "C:\Program Files\Ollama" to Path
3. Restart terminal or computer

Problem: "Access denied" errors
Solution:
1. Run PowerShell/CMD as Administrator
2. Check antivirus/firewall settings
3. Disable Windows Defender temporarily during install

Linux:

Problem: "Permission denied" when running ollama
Solution:
sudo chmod +x /usr/local/bin/ollama
sudo chown $USER:$USER ~/.ollama

Problem: "Could not connect to Ollama"
Solution:
# Check if service is running
systemctl status ollama

# Start if stopped
sudo systemctl start ollama

# Check logs
journalctl -u ollama -f

Model Issues

Out of Memory Errors:

# Reduce model size
ollama pull deepseek-coder:1.3b  # Smaller model

# Use quantized version
ollama pull llama3:8b-q4_0

# Increase swap space (Linux)
sudo fallocate -l 16G /swapfile

# Adjust context window
PARAMETER num_ctx 4096  # In Modelfile

Slow Performance:

# 1. Enable GPU acceleration
export OLLAMA_GPU="0"

# 2. Use smaller models
ollama pull phi:latest  # Very small, fast

# 3. Reduce context size
PARAMETER num_ctx 2048

# 4. Close other applications
# 5. Check CPU/GPU temperatures

Model Not Found:

# Update Ollama
ollama --version
# Download latest from website if outdated

# Pull model with full name
ollama pull deepseek-coder:6.7b

# List available models
ollama list

# Check model registry
curl https://registry.ollama.ai/v2/library/deepseek-coder/tags/list

Network & Connection Issues

Cannot Pull Models:

# 1. Check internet connection
# 2. Use proxy if behind firewall
export HTTP_PROXY="http://proxy.example.com:8080"
export HTTPS_PROXY="http://proxy.example.com:8080"

# 3. Manual download (advanced)
# Download GGUF file from HuggingFace
# Create custom Modelfile

Web UI Cannot Connect to Ollama:

# 1. Check if Ollama is running
ollama serve

# 2. Verify port
netstat -an | grep 11434  # Linux
Get-NetTCPConnection -LocalPort 11434  # Windows PowerShell

# 3. Check CORS settings
# In Web UI config, ensure Ollama URL is correct

Performance Monitoring

Windows:

# Monitor resources
Get-Process ollama*  # Check Ollama processes
Get-Counter "\Processor(_Total)\% Processor Time"
Get-Counter "\Memory\Available MBytes"

# GPU monitoring (NVIDIA)
nvidia-smi -l 1  # Update every second

# Disk activity
Get-Counter "\LogicalDisk(*)\% Disk Time"

Linux:

# Monitor CPU/RAM
htop  # Interactive process viewer
watch -n 1 "free -h"  # Memory usage every second

# GPU monitoring
nvidia-smi  # NVIDIA
rocm-smi   # AMD
intel_gpu_top  # Intel

# Disk I/O
iostat -x 1

# Network
iftop  # Bandwidth usage

Part 7: Educational Use Cases & Projects

Learning Programming

# Example: Use DeepSeek Coder for learning
"""
Project: Learn Python with Local AI
1. Ask DeepSeek to explain concepts
2. Request code examples
3. Get debugging help
4. Practice algorithms
"""

# Sample learning session prompts:
prompts = [
    "Explain object-oriented programming with Python examples",
    "Write a decorator that measures function execution time",
    "Debug this code: [insert buggy code]",
    "Compare lists vs tuples vs sets in Python",
    "Show me how to use async/await for concurrent tasks"
]

Research & Analysis

# Local document analysis
# Upload research papers to Web UI
# Ask questions about content
# Generate summaries
# Extract key insights

# Example workflow:
1. Upload PDF to Open WebUI
2. Ask: "What are the main findings of this paper?"
3. Request: "Summarize the methodology section"
4. Generate: "Create bullet points of key contributions"

Creative Writing Assistant

# Using Llama 3 for creative writing
- Brainstorm story ideas
- Develop characters
- Write dialogue
- Edit and refine prose
- Generate poetry in different styles

Coding Projects

// Project: Build an AI-powered code reviewer
const reviewerPrompts = [
    "Review this code for security vulnerabilities:",
    "Suggest optimizations for better performance:",
    "Check for PEP 8 compliance (Python):",
    "Identify potential bugs in this implementation:",
    "Suggest better variable names and documentation:"
];

// Integration example:
async function codeReview(code, language) {
    const prompt = `Review this ${language} code for best practices:\n${code}`;
    const response = await ollama.generate('deepseek-coder:6.7b', prompt);
    return parseReview(response);
}

Part 8: Security & Best Practices

Security Considerations

Data Privacy:

# Run entirely offline
# Disable automatic updates if needed
# Store sensitive data in encrypted volumes

# Linux: Use encrypted home directory
sudo apt install ecryptfs-utils
ecryptfs-migrate-home -u $USER

# Windows: Use BitLocker
Manage-bde -on C:  # Encrypt drive

Network Security:

# Run Ollama on localhost only (default)
# Change default port if needed
OLLAMA_HOST="127.0.0.1:11435" ollama serve

# Use firewall rules
# Windows:
New-NetFirewallRule -DisplayName "Ollama" -Direction Inbound -LocalPort 11434 -Protocol TCP -Action Block

# Linux:
sudo ufw deny 11434/tcp  # If exposing is not needed

Model Safety:

# 1. Download models from trusted sources only
# 2. Verify checksums when available
# 3. Use system prompts to set boundaries
SYSTEM """You are a helpful, harmless, and honest assistant.
Never provide instructions for illegal or harmful activities."""

# 4. Regular updates
ollama --version
# Check for security updates regularly

Maintenance & Updates

Regular Maintenance:

# Update Ollama
# Windows: Download latest installer and reinstall
# Linux: 
curl -fsSL https://ollama.com/install.sh | sh

# Update models
ollama pull deepseek-coder:latest  # Gets latest version

# Clean up old models
ollama list
ollama rm old-model-name

# Backup custom models
ollama show --modelfile my-model > my-model.Modelfile

Performance Maintenance:

# Clear cache
# Windows: Delete %LOCALAPPDATA%\Ollama\cache
# Linux: rm -rf ~/.ollama/cache

# Monitor disk space
# Windows: Cleanmgr
# Linux: ncdu ~/.ollama

# Regular system maintenance
# Defragment disk (Windows)
# Trim SSD (Linux: sudo fstrim -av)

Conclusion

Setting up Ollama with a Web UI on Windows or Linux opens up a world of possibilities for local AI experimentation and development. Whether you’re running DeepSeek models for coding assistance, Llama for general conversation, or specialized models for specific tasks, you now have a powerful, private, and cost-effective AI platform on your own computer.

Key Takeaways:

  1. Ollama is cross-platform and works well on both Windows and Linux
  2. Multiple Web UI options cater to different needs and preferences
  3. DeepSeek models excel at coding tasks and are freely available
  4. GPU acceleration significantly improves performance when available
  5. Customization options allow tailoring models to specific needs
  6. Running locally ensures privacy and eliminates API costs

Next Steps:

  • Experiment with different models and find what works best for your use case
  • Create custom Modelfiles for specialized tasks
  • Integrate Ollama into your development workflow
  • Join the Ollama community for support and updates
  • Consider contributing to open-source Web UI projects

Resources:

Remember that the field of local AI is rapidly evolving. Keep your software updated, experiment with new models as they’re released, and most importantly—have fun exploring the capabilities of AI on your own terms!


Disclaimer: This guide is for educational purposes. Always respect copyright laws and terms of service when using AI models. Be aware of the computational requirements and ensure your system can handle the load. Running large models may significantly impact system performance and electricity consumption.

Posts Carousel

Leave a Comment

Your email address will not be published. Required fields are marked with *

Latest Posts

Most Commented

Featured Videos