Preferences

Davidzheng parent
It kind of reminds me of this vanishing gradient problem in ML early on, where really deep layers won't train b/c you get these gradients dying midway, and the solution was to add these bypass connections (resnets style). I wonder if you can have similar solutions. Ofc I think what happens in general is like control theory, like you should be able to detect going off-course with some probability too and correct [longer horizon you have probability of leaving the safe-zone so you still get the exp decay but in larger field]. Not sure how to connect all these ideas though.

This item has no comments currently.