Preferences

The gains from MoE is that you can have a large model that's efficient, it lets you decouple #params and computation cost. I don't see how anthropomorphizing MoE <-> brain affords insight deeper than 'less activity means less energy used'. These are totally different systems, IMO this shallow comparison muddies the water and does a disservice to each field of study. There's been loads of research showing there's redundancy in MoE models, ie cerebras has a paper[1] where they selectively prune half the experts with minimal loss across domains -- I'm not sure you could disable half the brain and notice a stupefying difference.

[1] https://www.cerebras.ai/blog/reap


> I don't see how anthropomorphizing MoE <-> brain affords insight deeper than 'less activity means less energy used'.

I'm not saying it is a perfect analogy, but it is by far the most familiar one for people to describe what sparse activation means. I'm no big fan of over-reliance on biological metaphor in this field, but I think this is skewing a bit on the pedantic side.

re: your second comment about pruning, not to get in the weeds but I think there have been a few unique cases where people did lose some of their brain and the brain essentially routed around it.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal