How to set up an offline AI that works when the internet doesn't
Your AI stops working the second your internet does.
Your AI stops working the second your internet does.
Every prompt, coding session, brainstorm, and draft. All of it requires a connection to someone else’s server. When that connection drops, you’re back to doing everything manually.
You should have an offline AI on your machine right now. The same way you’d keep a flashlight in the junk drawer or a spare tire in the trunk. Not because you’re expecting catastrophe. Because the one time you need it and don’t have it, you’re stuck staring at a loading spinner with no fallback.
This came to me as I have an off-grid property that I go to often for a month or so in the summer and I intentionally do not want to have internet there. It’s just my intentional, quiet corner of the world. At the same time I want to be able to troubleshoot fixing my boat or other types of issues so having an off-grid LLM is what inspired me to write this.
I’m going to walk you through how to set this up in about 15 minutes. By the end, you’ll have a working AI model running on your computer that needs zero internet, costs zero dollars, and answers to nobody but you.
The Setup
Everything in this guide uses the terminal (that black window with the blinking cursor). If that makes you want to close this tab, hold on.
There’s a desktop app called LM Studio that does the same thing with a normal visual interface. Download it from lmstudio.ai, search for Gemma 4, click install, and start chatting. No commands. No configuration. Just a regular app window.
LM Studio is a perfectly good option if all you want is a local AI to talk to. The rest of this guide focuses on the Ollama setup because it connects to Claude Code, which gives you a more powerful coding and work environment. But if the terminal isn’t your thing, LM Studio will get you to the same destination through a friendlier door.
Why Gemma 4
Google released Gemma 4 on April 2nd. It’s a free, open source AI model you can download and run on your own computer. No account needed. No subscription. No data leaving your machine. Once it’s downloaded, it’s yours.
It comes in four sizes. Think of these like engine sizes. The bigger the engine, the more powerful it is, but the more fuel (in this case, memory) your computer needs to run it.
Small (E2B): Runs on low-powered hardware. Good for quick questions and simple tasks.
Medium (E4B): The one I’d recommend for most people. Runs on any modern laptop or desktop. Handles email drafts, brainstorming, writing help, code generation.
Large (26B): Needs a computer with at least 18GB of memory. Significantly better at reasoning and complex tasks. If you’ve got a newer MacBook Pro or a decent desktop, this is the sweet spot.
Extra Large (31B): Needs at least 20GB of memory. Best quality available but you need a serious machine.
NOT SURE WHICH ONE YOUR COMPUTER CAN HANDLE? Start with the medium (E4B). You can always upgrade later.
To check your available memory: on Mac, open Activity Monitor. On Windows, open Task Manager and click the Performance tab. If you’ve got 8GB or more free, the medium model will run fine.
Step 1: Install Ollama
Ollama is the app that runs AI models on your computer. Think of it as a player. Gemma 4 is the disc. You need the player first.
Go to ollama.com. Click download. Pick your operating system (Mac, Windows, or Linux).
On Mac: unzip the download, drag it to your Applications folder, open it. Done.
On Windows: run the installer and follow the prompts. Done.
Open your terminal (on Mac, search for “Terminal” in Spotlight. On Windows, search for “Command Prompt” or “PowerShell”) and type:
ollama --version
If you see a version number, Ollama is installed and working.
Step 2: Download your AI model
Now you’re going to download Gemma 4 onto your machine. This is a one-time download. Once it’s done, the model lives on your hard drive permanently.
In your terminal, type:
ollama pull gemma4
This downloads the medium (E4B) model, which is about 9.6GB. Give it a few minutes depending on your internet speed. This is the last time you’ll need an internet connection for this.
Once it finishes, test it immediately. Type:
ollama run gemma4
Ask it anything. “What’s the capital of France?” or “Help me write an email to my boss about taking Friday off.”
If you get a response, your offline AI is working. Turn off your wifi right now and try again. Still works.
Press Ctrl+D to exit when you’re done testing.
Step 3: Connect it to Claude Code (optional)
This step is for people who use Claude Code (Anthropic’s coding tool that runs in the terminal). If you don’t use Claude Code, you can skip this entirely. The model you set up in Step 2 already works on its own.
If you DO use Claude Code, you can point it at your local model instead of Anthropic’s servers. Same tool, same workflow, but everything runs offline.
The simplest way:
ollama launch claude --model gemma4
That one command connects everything automatically. You’re now running Claude Code powered by your local Gemma 4 model. No API costs. No internet required.
If that command doesn’t work, you may need to update Ollama to the latest version from ollama.com. Alternatively, you can set it up manually by adding a few configuration lines to your terminal. Drop a comment below and I’ll walk you through the manual setup.
What it’s good at (and what it isn’t)
I want to be straight about this.
YOUR LOCAL MODEL HANDLES THESE WELL: writing emails, brainstorming ideas, answering general questions, explaining concepts, drafting documents, basic code generation, summarizing long text.
WHERE IT FALLS SHORT: complex multi-step reasoning, tasks that require holding a lot of context at once, and the kind of deep architectural thinking that a model like Claude Opus excels at.
I think of it like a backup generator. Nobody expects the generator to power the whole neighborhood. You need it to keep the lights on while you figure out your next move. That’s what your local model does.
The two-tier setup I actually use
Opus 4.6 on the cloud for the heavy thinking. Local Gemma for everything else.
When I’m online and working on something complex, I use Opus. When I’m on a plane, when Anthropic is having an outage, when I’m burning through rate limits, or when I just want to bang out some quick work without paying for tokens, I switch to local Gemma.
Both run through Claude Code. Same commands, same workflow. The only difference is where the thinking happens.
What about mobile?
Running a local AI on your phone is a different setup entirely and I’ll cover it in a future article. This guide is focused on getting your laptop or desktop set up first.
Why This is Important
Every time you use a cloud AI, you’re making a bet. That their servers stay online. That their pricing stays reasonable. That their terms of service don’t change in ways that mess with your work. That their company doesn’t make decisions that cut off your access.
Those bets have been fine so far. Mostly. But “mostly” is doing a lot of heavy lifting in that sentence.
A local model is insurance. It costs you 15 minutes and some hard drive space. In exchange, you get a working AI that answers to nobody. No subscription. No outage page. No pricing changes. No terms of service.
You keep a flashlight in the drawer because you’ve lost power before.
Set up your offline AI for the same reason.
If you want more setups like this, breakdowns of what actually works, and the stuff I'm building behind the scenes, that's what the paid version of The AI Handbook is for. No fluff. No hype. Just the systems I actually use.
Ryan


I will definitely be setting this up tonight.
I've been having a lot of fun with gemma4 (the e4b version in my case). even ran a whole set of tests for an upcoming post about deceitful LLM testing and it's been super helpful. better AND faster than gemma3:12b, which feels WILD.