Register now for better personalized quote!

HOT NEWS

This is the fastest local AI I've tried, and it's not even close - how to get it

Aug, 13, 2025 Hi-network.com
gpt-ollama
Jack Wallen / Elyse Betters Picaro /

Key takeaways

  • The gpt-oss:20b model is very fast.
  • You'll get blazing-fast answers to your queries with gpt-oss:20b.
  • With the latest version of Ollama installed, you can use this model.

Get more in-depth tech coverage: Add us as a preferred Google source  on Chrome and Chromium browsers.


Let's talk about local AI and speed. There are a lot of factors that go into getting the most speed out of your AI, such as:

  • Whether you have a dedicated GPU.
  • The context length you use (the smaller, the faster).
  • The complexity of your query.
  • The LLM you use.

I've tried quite a few different local LLMs, using Ollama on both Linux and MacOS, and I've recently run into one that blew all the others away -- with regard to speed. That model is gpt-oss:20b. I've found that on both Linux and MacOS, that model is lights-out faster than the others I've used. This model generates 30 tokens per second.

Also: My go-to LLM tool just dropped a super simple Mac and PC app for local AI - why youcurl -fsSL https://ollama.com/install.sh | should try it

What is a token? Think of them as pieces of words used for the processing of natural language. For example, with English text, 1 token is approximately 4 characters or 0.75 words, which means gpt-oss:20b can process 120 characters per second.

That's not bad.

Consider a localized version of llama3.2, which can achieve around 14 tokens per second. See the difference?

OK, now that I've (hopefully) convinced you that gpt-oss:20b is the way to go, how do you use it as a local LLM?

How to update Ollama

What you'll need:To make this work, you'll need either a running version of Ollama (it doesn't matter what desktop OS you're using) or you'll need to install it fresh.

1. Update Ollama on Linux

If you're using Linux, you can update Ollama with the same command used to install it, which is:

Show more

curl -fsSL https://ollama.com/install.sh | sh

2. Update Ollama on MacOS or Windows

To update Ollama on either MacOS or Windows, you would simply download the binary installer, launch it, and follow the steps as described in the wizard. If you get an error that it cannot be installed because Ollama is still running, you'll need to stop Ollama before running the installer. To stop Ollama, you can either find it in your OS's process monitor or run the command:

Show more

osascript -e 'tell app "Ollama" to quit'

On Windows, that command would be:

taskkill /im ollama.exe /f

You might run into a problem. If, after upgrading, you get an error (when pulling gpt-oss) that you need to run the latest version of Ollama, you'll have to install the latest iteration from the Ollama GitHub page. How you do that will depend on which OS you use.

Also: How I feed my files to a local AI for better, more relevant responses

It is necessary to be running at least Ollama version 0.11.4 to use the gpt-oss models.

How to pull the gpt-oss LLM

The next step is to pull the LLM from the command line. Remember, the model we're looking for is gpt-oss:20b, which is roughly 13GB in size. There's also the larger model, gpt-oss:120b, but that one requires over 60 GB of RAM to function properly. If your machine has less than 60 GB of RAM, stick with 20b.

Also: How to run DeepSeek AI locally to protect your privacy - 2 easy ways

To pull the LLM, run the following command (regardless of OS):

ollama pull gpt-oss:20b

Depending on your network speed, this will take a few minutes to complete.

How to use gpt-oss

OK, now that you've updated Ollama and pulled the LLM, you can use it. If you interact with Ollama from the command line, run the model with:

ollama run gpt-oss:20b

Once you're at the Ollama console, you can start querying the newly added LLM.

If you use the Ollama GUI app (on MacOS or Windows), you should be able to select gpt-oss:20b from the model drop-down in the app.

Also: I tried Sanctum's local AI app, and it's exactly what I needed to keep my data private

And that's all there is to making use of the fastest local LLM I've tested to date.

tag-icon Hot Tags : Artificial Intelligence Innovation Featured

Copyright © 2014-2024 Hi-Network.com | HAILIAN TECHNOLOGY CO., LIMITED | All Rights Reserved.
Our company's operations and information are independent of the manufacturers' positions, nor a part of any listed trademarks company.