Mixture-of-experts models quietly changed what hardware you need for local AI
…Since you're routing a small subset of tokens to each expert, they get a limited view of the entire training distribution. This puts a question mark over the effectiveness of the…
