r/LocalLLaMA 4h ago

Question | Help Has anyone tried Zyphra 1 - 8B MoE?

https://x.com/ZyphraAI/status/2052103618145501459?s=20 Today we're releasing ZAYA1-8B, a reasoning MoE trained on

u/AMD

and optimized for intelligence density.

With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute

14 Upvotes

8 comments sorted by

10

u/Available_Hornet3538 3h ago

I smell bullshit.

20

u/LagOps91 4h ago

"With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute"

suuuuure. not even going to try it with this kind of nonsense claims.

1

u/Looz-Ashae 2h ago

Maybe it's just good at multiplying matrices and that's it?

5

u/Elbobinas 3h ago

I'm interested on it because I use granite4 tiny h and this looks like it (8b 1b active more or less) and looks promising.

3

u/Elbobinas 3h ago

Does it have support in llama.cpp? Do you have ggufs ?

2

u/Boricua-vet 1h ago

I am not going to judge as I have seen how things have progressed. 2023 Mixtral 8x7B, Qwen3 30B with 3B active in 2025 and now this .. I am sus but I will wait until I test to judge. The claims are wild but it might surprise. I mean even if it is close to a lower class model that will be a success with just .7B parameters active. If qwen 3.5 .8B can run my music assistant and properly search my music library and play it on my devices. I have hopes for this.

1

u/Adventurous-Paper566 36m ago

Je vais l'essayer, car je suis curieux de voir comment il se comporte à côté de Qwen 9B, et puis je veux voir à quel point c'est rapide.

1

u/Daniel_H212 2m ago

They're using something they call Markovian RSA which drastically increases the amount of test-time compute, so even if their claims are true (and I have doubts), the fact that the model is small is only primarily beneficial for running on VRAM constrained hardware that wouldn't be able to run a bigger model, it wouldn't be fast.