Get more in-depth tech coverage: Add us as a preferred Google source on Chrome and Chromium browsers.
Let's talk about local AI and speed. There are a lot of factors that go into getting the most speed out of your AI, such as:
I've tried quite a few different local LLMs, using Ollama on both Linux and MacOS, and I've recently run into one that blew all the others away -- with regard to speed. That model is gpt-oss:20b. I've found that on both Linux and MacOS, that model is lights-out faster than the others I've used. This model generates 30 tokens per second.
Also: My go-to LLM tool just dropped a super simple Mac and PC app for local AI - why youcurl -fsSL https://ollama.com/install.sh | should try it
What is a token? Think of them as pieces of words used for the processing of natural language. For example, with English text, 1 token is approximately 4 characters or 0.75 words, which means gpt-oss:20b can process 120 characters per second.
That's not bad.
Consider a localized version of llama3.2, which can achieve around 14 tokens per second. See the difference?
OK, now that I've (hopefully) convinced you that gpt-oss:20b is the way to go, how do you use it as a local LLM?
What you'll need:To make this work, you'll need either a running version of Ollama (it doesn't matter what desktop OS you're using) or you'll need to install it fresh.
If you're using Linux, you can update Ollama with the same command used to install it, which is:
curl -fsSL https://ollama.com/install.sh | sh
To update Ollama on either MacOS or Windows, you would simply download the binary installer, launch it, and follow the steps as described in the wizard. If you get an error that it cannot be installed because Ollama is still running, you'll need to stop Ollama before running the installer. To stop Ollama, you can either find it in your OS's process monitor or run the command:
osascript -e 'tell app "Ollama" to quit'
On Windows, that command would be:
taskkill /im ollama.exe /f
You might run into a problem. If, after upgrading, you get an error (when pulling gpt-oss) that you need to run the latest version of Ollama, you'll have to install the latest iteration from the Ollama GitHub page. How you do that will depend on which OS you use.
Also: How I feed my files to a local AI for better, more relevant responses
It is necessary to be running at least Ollama version 0.11.4 to use the gpt-oss models.
The next step is to pull the LLM from the command line. Remember, the model we're looking for is gpt-oss:20b, which is roughly 13GB in size. There's also the larger model, gpt-oss:120b, but that one requires over 60 GB of RAM to function properly. If your machine has less than 60 GB of RAM, stick with 20b.
Also: How to run DeepSeek AI locally to protect your privacy - 2 easy ways
To pull the LLM, run the following command (regardless of OS):
ollama pull gpt-oss:20b
Depending on your network speed, this will take a few minutes to complete.
OK, now that you've updated Ollama and pulled the LLM, you can use it. If you interact with Ollama from the command line, run the model with:
ollama run gpt-oss:20b
Once you're at the Ollama console, you can start querying the newly added LLM.
If you use the Ollama GUI app (on MacOS or Windows), you should be able to select gpt-oss:20b from the model drop-down in the app.
Also: I tried Sanctum's local AI app, and it's exactly what I needed to keep my data private
And that's all there is to making use of the fastest local LLM I've tested to date.