Mistral 3 Small could unlock powerful local AI use cases

Mistral 3 Small is fast, lightweight, and ideal for local workflows, but it exhibits knowledge compression when compared to bigger frontier models.

Published: Friday, January 31st 2025

1 min (227 words)

ajfisher - Gemini / Nanobanana (Mistral Logo via Mistral.ai)

It seems like this week is proving to be a particularly big week for AI models that you can run on your laptop. Seemingly not to be outdone, Mistral has dropped Mistral 3 Small today, which has been optimised for speed and footprint size.

The model certainly runs on my laptop (M2 Macbook Pro) easily and from my early benchmarking, the memory use is really low compared to Llama3.3. It’s also incredibly fast - token generation is lightning fast.

Assuming you were working locally and had local STT and TTS set up I suspect this could provide low latency for interactions that don’t require backing data or network lookups.

The biggest gap I can see immediately is that there’s been considerable compression of the knowledge base. Whereas with Llama3.3:70B you can ask it about specific blog posts by users and it will actually know what you’re talking about (without looking it up), it seems Mistral 3 Small:24B has largely “smooshed” (technical term) that knowledge out.

Coupled with the deepseek developments this week, this is driving home the point that open models have made up significant ground in the last few months, that the moat around frontier models is now almost non-existent. Companies playing in the model space need to really start thinking about how they create great user / developer experience, enable security and make this compelling for businesses to adopt.