Preferences

AaronFriel parent
Not using the NVDEC and NVJPG units to decompress weights into registers? And you say you're using the whole GPU. There are entire blocks on the silicon going idle!

refibrillator
Ha made me chuckle. For those wondering seriously about this, it’s not a viable optimization because weights are not readily compressible via JPEG/DCT, and there are a limited number of these units on the chip which bottlenecks throughout, meaning speed is dwarfed by simply reading uncompressed weights from HBM.
jhoho
It seems like this is indeed possible using video codecs: https://arxiv.org/abs/2407.00467v1
touisteur
Good fun. Now I wish RT cores would be programmable with some form of PTX, but for now it's Optix or die. Managed to do fun stuff with it but it's like pulling teeth.
moralestapia
Yeah, but they could be.

I won an GPU hackathon back in 2019 doing something very similar to this; although the other way around, I was compressing weights using hardware modules.

heavyset_go
Have a link to this?
moralestapia
Unfortunately no. I have cool picture, though!
heavyset_go
I will have to settle for a picture then :)
moralestapia
Send email (see profile), I'll gladly share more details ^^.

This item has no comments currently.