This Is The Fastest Local Ai I've Tried, And It's Not Even Close

SERVERS

gpt-ollama — Jack Wallen / Elyse Betters Picaro /

Key takeaways

The gpt-oss:20b model is very fast.
You'll get blazing-fast answers to your queries with gpt-oss:20b.
With the latest version of Ollama installed, you can use this model.

Get more in-depth tech coverage: Add us as a preferred Google source on Chrome and Chromium browsers.

Let's talk about local AI and speed. There are a lot of factors that go into getting the most speed out of your AI, such as:

Whether you have a dedicated GPU.
The context length you use (the smaller, the faster).
The complexity of your query.
The LLM you use.

I've tried quite a few different local LLMs, using Ollama on both Linux and MacOS, and I've recently run into one that blew all the others away -- with regard to speed. That model is gpt-oss:20b. I've found that on both Linux and MacOS, that model is lights-out faster than the others I've used. This model generates 30 tokens per second.

Also: My go-to LLM tool just dropped a super simple Mac and PC app for local AI - why youcurl -fsSL https://ollama.com/install.sh | should try it

What is a token? Think of them as pieces of words used for the processing of natural language. For example, with English text, 1 token is approximately 4 characters or 0.75 words, which means gpt-oss:20b can process 120 characters per second.

That's not bad.

Consider a localized version of llama3.2, which can achieve around 14 tokens per second. See the difference?

OK, now that I've (hopefully) convinced you that gpt-oss:20b is the way to go, how do you use it as a local LLM?

How to update Ollama

What you'll need:To make this work, you'll need either a running version of Ollama (it doesn't matter what desktop OS you're using) or you'll need to install it fresh.

1. Update Ollama on Linux

If you're using Linux, you can update Ollama with the same command used to install it, which is:

curl -fsSL https://ollama.com/install.sh | sh

2. Update Ollama on MacOS or Windows

To update Ollama on either MacOS or Windows, you would simply download the binary installer, launch it, and follow the steps as described in the wizard. If you get an error that it cannot be installed because Ollama is still running, you'll need to stop Ollama before running the installer. To stop Ollama, you can either find it in your OS's process monitor or run the command:

osascript -e 'tell app "Ollama" to quit'

On Windows, that command would be:

taskkill /im ollama.exe /f

You might run into a problem. If, after upgrading, you get an error (when pulling gpt-oss) that you need to run the latest version of Ollama, you'll have to install the latest iteration from the Ollama GitHub page. How you do that will depend on which OS you use.

Also: How I feed my files to a local AI for better, more relevant responses

It is necessary to be running at least Ollama version 0.11.4 to use the gpt-oss models.

How to pull the gpt-oss LLM

The next step is to pull the LLM from the command line. Remember, the model we're looking for is gpt-oss:20b, which is roughly 13GB in size. There's also the larger model, gpt-oss:120b, but that one requires over 60 GB of RAM to function properly. If your machine has less than 60 GB of RAM, stick with 20b.

Also: How to run DeepSeek AI locally to protect your privacy - 2 easy ways

To pull the LLM, run the following command (regardless of OS):

ollama pull gpt-oss:20b

Depending on your network speed, this will take a few minutes to complete.

How to use gpt-oss

OK, now that you've updated Ollama and pulled the LLM, you can use it. If you interact with Ollama from the command line, run the model with:

ollama run gpt-oss:20b

Once you're at the Ollama console, you can start querying the newly added LLM.

If you use the Ollama GUI app (on MacOS or Windows), you should be able to select gpt-oss:20b from the model drop-down in the app.

Also: I tried Sanctum's local AI app, and it's exactly what I needed to keep my data private

And that's all there is to making use of the fastest local LLM I've tested to date.

Featured

I tested GPT-5's coding skills, and it was so bad that I'm sticking with GPT-4o (for now)
I tried Lenovo's new rollable ThinkBook and can't go back to regular-sized screens
Should you upgrade to mesh? I compared it with a traditional Wi-Fi router, and here's my advice
How to upgrade an 'incompatible' Windows 10 PC to Windows 11 - 2 free options

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVERS

HOT NEWS

S5735-L48LP4XE-A-V2: Scalable, Secure, and PoE-Ready for Demanding Enterprise Deployments

S5735-L48LP4S-A-V2 Powers Smarter Campus Networks with Advanced PoE and Cloud Management

S5735-L24T4X-A1 Empowers Installers with Scalable, Reliable, and Efficient Network Access

Best Ethernet Switches for Business (2025): Selection Guide and Top Picks

Huawei S5735-L24T4S-A1: A Compact, Stackable Access Switch Built for the Future

Huawei S5735-L24T4S-A: High-Performance Stacking Meets Zero-Noise Deployment

S5735-L24P4XE-A-V2: Huawei’s Smart Choice for High-Density Campus Deployments

S5735-L24P4X-A1: Huawei’s High-Performance Access Switch Redefining Campus Networking

Huawei S5735-L24P4S-A1 Review: Reliable Gigabit Access with Enterprise-Grade Features

What Is an Orthogonal Architecture?

Huawei s5735-l24p4s-a-v2 Delivers Scalable, Secure, and Smart PoE Access for Modern IT Infrastructures

Huawei S5735-L48T4XE-A-V2 Switch Delivers Enterprise-Grade Performance in a Compact Design

Huawei S5735-L48P4XE-A-V2 Review: Versatile Campus Switch with iStack and Full L3 Support

Differences Between Huawei CE Series and S Series Switches

Huawei CloudEngine S5735 Switches Set the Benchmark for High-Performance, Energy-Efficient Switching

Huawei CloudEngine S5731‑S48P4X Datasheet

Huawei CloudEngine S5731‑S24P4X Datasheet

Huawei S5731-S Empowers Next-Generation Campus Networks with Advanced Capabilities

Huawei S5731-H24P4XC Switch Review: Power-Packed Performance and Smart PoE

Huawei S5731-H Series Switches Redefine Campus Networking with Intelligent High-Performance Architecture

Top Features of the Huawei S5731-S24T4X: The Ultimate Gigabit Access Switch for Modern Networks

General Power Module Fault Location Procedure (CE8800 & 7800 & 6800 & 5800)

How Do I Split a Stack? How to clear the stacking configuration?

Huawei CloudEngine S5731 Datasheet

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

This is the fastest local AI I've tried, and it's not even close - how to get it

Key takeaways

How to update Ollama

1. Update Ollama on Linux

2. Update Ollama on MacOS or Windows

How to pull the gpt-oss LLM

How to use gpt-oss

Featured

Hot Tags : Artificial Intelligence Innovation Featured

Ordering Guide

Resources

About Us

Huawei CloudEngine S5731‑S48P4X Datasheet