Getting Started with Ollama: A Practical Guide for Teams
Ollama changed how I think about running AI in small-team environments. No API keys. No per-token billing. No sending your data to someone else's servers. Just a binary you run and models you download. Here's how to go from zero to a working private AI setup in an afternoon.
What Is Ollama, Exactly?
Ollama is an open-source tool that runs large language models locally on your hardware. You download a model, run a command, and you've got a local API that behaves like OpenAI's — minus the cloud dependency. It works on macOS, Linux, and Windows, and it supports a wide range of models from Llama to Mistral to Phi.
The appeal for teams is straightforward: you control the infrastructure. Your data never leaves your network. You're not at the mercy of API pricing changes. And for internal workflows — automation scripts, document processing, research tooling — it's more than capable.
Step 1: Install It
The installation is a single command on macOS and Linux. On Windows, Ollama runs well inside WSL2 (Windows Subsystem for Linux).
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Verify it's running
ollama --version
On first run, Ollama starts a local server on http://localhost:11434. You can talk to it via the CLI or hit the API from any language that speaks HTTP.
Step 2: Pull Your First Model
Models are pulled from Ollama's library. For general-purpose tasks, Llama 3.2 3B is a good starting point — capable enough for most workflows, light enough to run on a decent laptop.
# Pull a model
ollama pull llama3.2:3b
# List what you've got
ollama list
# Try it out
ollama run llama3.2:3b "Explain quantum entanglement in simple terms."
Other good starting models:
- Mistral 7B — Strong all-around performer, good for code and reasoning
- Phi-3 Mini — Smaller footprint, good for lightweight tasks
- codellama:7b — Specialized for code generation and explanation
Step 3: Connect a UI
The CLI is great for testing, but your team will want a web interface. Open WebUI (formerly Ollama WebUI) is the best option — it's open-source, easy to self-host, and gives you a ChatGPT-like experience backed by your local models.
# With Docker installed:
docker run -d -p 3000:8080 \
-e OLLAMA_BASE_URL=http://localhost:11434 \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Then open http://localhost:3000, create an account, and you've got a full chat interface running entirely on your hardware.
Step 4: Expose It to Your Network (Carefully)
By default, Ollama only listens on localhost. To use it from other machines on your network — or to connect it to automation tools — you need to configure it to listen on your LAN IP.
# Linux systemd — edit the service file:
sudo systemctl edit ollama
# Add:
#[Service]
#Environment="OLLAMA_HOST=0.0.0.0"
# Then reload:
sudo systemctl daemon-reload
sudo systemctl restart ollama
Important security note: If you expose Ollama to your LAN, do not expose it to the public internet without authentication in front of it. Ollama has no built-in auth. Consider a reverse proxy with a simple API key, or restricting access to your VPN/Tailscale network only.
Step 5: Connect It to Your Tools
Ollama's API is OpenAI-compatible, which means most tools that support OpenAI can be pointed at your local server instead. The base URL shift is minimal:
# Instead of:
# https://api.openai.com/v1/chat/completions
# You use:
http://YOUR_SERVER_IP:11434/v1/chat/completions
With that change, tools like n8n, Make.com, LangChain, or any custom script can talk to your private AI. This is where it becomes genuinely useful for business workflows — not just a demo you show once.
What You Can Actually Build
With Ollama running locally, the types of things you can automate expand significantly:
- Research agents — Give a model access to your internal docs, have it answer questions across them
- Draft generation — Connect to your CRM, auto-generate first-draft outreach emails from templates
- Support triage — Route and categorize incoming requests before a human reviews them
- Code review — Run your PRs through a local model trained on your codebase
- Meeting summaries — Pipe transcripts through a model, get action items extracted automatically
Where to Get Help
The Ollama community is active and helpful. The official blog has solid documentation, and the GitHub repo's issues are usually responsive. For teams trying to build serious workflows on top of Ollama, that's where something like Synapse Systems comes in — getting from "it works on my machine" to "reliably running for my team" is where the real work is.
If you're running Ollama and want help getting it integrated into your operation, book a free consult. We've helped a dozen teams go from zero to production on private AI infrastructure.