What is Ollama?

A lightweight, open-source runtime that lets you download and run large language models entirely on your own hardware. No API keys, no monthly fees, no data sent to third parties.

100% Local

Every model runs on your CPU or GPU. Your prompts, your data, and your responses never leave the machine. Perfect for stations handling sensitive playlist data or unreleased content metadata.

REST API on :11434

Ollama exposes a clean JSON API at http://localhost:11434. Any application on your machine — including Mcaster1Studio and Mcaster1AMP — can send prompts and receive completions over plain HTTP.

Model Library

Pull from hundreds of pre-quantized models with a single command. Llama 3.1, Mistral, Gemma, Phi, CodeLlama, and more — each optimized for different hardware profiles and use cases.

Installation

Ollama supports macOS, Linux, and Windows. Installation takes under a minute on most systems.

macOS

# Install Ollama on macOS (Apple Silicon & Intel) curl -fsSL https://ollama.com/install.sh | sh

Linux

# Install Ollama on Linux (Ubuntu, Debian, Fedora, Arch, etc.) curl -fsSL https://ollama.com/install.sh | sh

Windows

# Download the Windows installer from: https://ollama.com/download # Run the .exe installer and follow the prompts. # Ollama will install as a background service.

Verify Installation

# Confirm Ollama is installed ollama --version # Expected output: ollama version 0.x.x

Start the Server

# Start the Ollama API server (runs on port 11434) ollama serve # The server will listen at http://localhost:11434 # On macOS and Windows, Ollama starts automatically after install. # On Linux, you may want to enable the systemd service: sudo systemctl enable --now ollama

Hardware Tiers

Not every broadcast machine is a powerhouse. Pick the right model size for your available RAM, CPU cores, and storage to keep your station running smoothly.

Budget Tier

Entry-Level Broadcast PC

8 GB RAM • 4-core CPU • 20 GB storage

Phi-3 Mini (3.8B) — compact & fast
Gemma 2B — efficient text generation
TinyLlama 1.1B — minimal footprint
Best for: short bios, tag cleanup, basic prompts

Standard Tier

Mid-Range Workstation

16 GB RAM • 8-core CPU • 50 GB storage

Llama 3.1 8B — general purpose workhorse
Mistral 7B — fast inference, great quality
Gemma 7B — strong reasoning ability
Best for: artist bios, playlist logic, show notes

Pro Tier

Dedicated AI Server

32 GB+ RAM • 12+ cores • 100 GB+ storage

Llama 3.1 70B — near-cloud quality
Mixtral 8x7B — mixture-of-experts speed
CodeLlama 34B — code & scripting tasks
Best for: long-form content, multi-step workflows, analytics

Pull Your First Model

Each ollama pull command downloads a quantized model optimized for CPU inference. Models are cached locally so subsequent runs start instantly.

# General-purpose powerhouse — great for artist bios, show notes, and content drafts ollama pull llama3.1 # Downloads the 8B parameter model (~4.7 GB)

# Efficient and accurate — great for playlist logic and metadata enrichment ollama pull gemma2 # Google's Gemma 2 model, excellent quality-to-size ratio

# Fast inference — great for real-time content generation during live shows ollama pull mistral # Mistral 7B: low latency, strong instruction following

# Compact model — runs well on budget hardware with limited RAM ollama pull phi3 # Microsoft Phi-3: surprisingly capable for its size

Test Interactively

# Start an interactive chat session to test your model ollama run llama3.1 # Type a prompt and press Enter. Use /bye to exit. # Example: "Write a 50-word artist bio for a jazz fusion band called Solar Wind."

Integration with Mcaster1

Once Ollama is running, Mcaster1 products detect it automatically and unlock AI-powered features — no configuration files to edit, no API keys to manage.

Mcaster1Studio

The broadcast automation suite connects to Ollama's REST API at localhost:11434 to generate artist bios on the fly, create show notes from your playlist history, and draft social media posts for upcoming segments.

Auto-detected

Mcaster1AMP

The intelligent media player uses local AI models to analyze your music library, suggest playlist transitions based on tempo and genre, and generate descriptive metadata for tracks missing artist information.

Auto-detected

How It Works

Both products send standard HTTP POST requests to Ollama's /api/generate and /api/chat endpoints. You choose which model to use in each product's AI settings panel. Responses stream back in real time.

View All Products

Ollama API Quick Reference

These are the core endpoints you will use most often. All requests go to http://localhost:11434 and accept/return JSON.

Generate Text

# Generate a text completion from a single prompt curl http://localhost:11434/api/generate -d '{ "model": "llama3.1", "prompt": "Write a 50-word artist bio for a jazz fusion band called Solar Wind." }' # Add "stream": false to get the full response in one JSON object curl http://localhost:11434/api/generate -d '{ "model": "llama3.1", "prompt": "Write a 50-word artist bio for a jazz fusion band called Solar Wind.", "stream": false }'

List Installed Models

# List all models you have pulled locally curl http://localhost:11434/api/tags # Returns JSON with model names, sizes, and modification dates

Chat Format (Multi-Turn)

# Multi-turn chat — maintains conversation context curl http://localhost:11434/api/chat -d '{ "model": "llama3.1", "messages": [ { "role": "user", "content": "Suggest 5 smooth jazz tracks for a late-night radio show." } ] }' # Add previous assistant/user messages for multi-turn conversations

Model Information

# Get detailed information about a specific model curl http://localhost:11434/api/show -d '{ "name": "llama3.1" }' # Returns: model parameters, template, license, quantization level

Ready to Broadcast Smarter?

With Ollama running locally, you have a private AI assistant purpose-built for your station. Explore our model recommendations to find the perfect fit for your hardware and workflow.

AI Model Guide Mcaster1 Products Ollama Model Library

Install & Configure Ollama