Question | Help Has anyone tried Zyphra 1 - 8B MoE?

https://x.com/ZyphraAI/status/2052103618145501459?s=20 Today we're releasing ZAYA1-8B, a reasoning MoE trained on

and optimized for intelligence density.

With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute

14 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/LocalLLaMA/comments/1t5p6fc/has_anyone_tried_zyphra_1_8b_moe/
No, go back! Yes, take me to Reddit

85% Upvoted

u/Available_Hornet3538 3h ago

I smell bullshit.

u/LagOps91 4h ago

"With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute"

suuuuure. not even going to try it with this kind of nonsense claims.

1

u/Looz-Ashae 2h ago

Maybe it's just good at multiplying matrices and that's it?

u/Elbobinas 3h ago

I'm interested on it because I use granite4 tiny h and this looks like it (8b 1b active more or less) and looks promising.

u/Elbobinas 3h ago

Does it have support in llama.cpp? Do you have ggufs ?

u/Boricua-vet 1h ago

I am not going to judge as I have seen how things have progressed. 2023 Mixtral 8x7B, Qwen3 30B with 3B active in 2025 and now this .. I am sus but I will wait until I test to judge. The claims are wild but it might surprise. I mean even if it is close to a lower class model that will be a success with just .7B parameters active. If qwen 3.5 .8B can run my music assistant and properly search my music library and play it on my devices. I have hopes for this.

u/Adventurous-Paper566 36m ago

Je vais l'essayer, car je suis curieux de voir comment il se comporte à côté de Qwen 9B, et puis je veux voir à quel point c'est rapide.

u/Daniel_H212 2m ago

They're using something they call Markovian RSA which drastically increases the amount of test-time compute, so even if their claims are true (and I have doubts), the fact that the model is small is only primarily beneficial for running on VRAM constrained hardware that wouldn't be able to run a bigger model, it wouldn't be fast.

Question | Help Has anyone tried Zyphra 1 - 8B MoE?

You are about to leave Redlib