Comment by laibert - Hacker Neue

laibert Jun 26, 2017 parent

This is amazing - impressed by your persistence to source the training data yourself, that must have been tedious!

Did you try quantizing the parameters to shrink the model size some more? If so, how did it affect the results? It also runs slightly faster on mobile from my experience.

timanglade Jun 26, 2017

Great question — I did not, because I had unfortunately spent all of my data on that last training run, and I did not have a untainted dataset left to measure the impact of quantization on. (Just poor planning on my part really.)

It’s also my understanding at the moment that quantization does not help with inference speed or memory usage, which were my chief concerns. I was comfortable with the binary size (<20MB) that was being shipped and did not feel the need to save a few more MBs there. I was more worried about accuracy, and did not want to ship a quantized version of my network without being able to assess the impact.

Finally, it now seems that quantization may be best applied at training time rather than at shipping time, according to a recent paper by the University of Iowa & Snapchat [0], so I would probably want to bake that earlier into my design phase next time around.

[0]: https://arxiv.org/abs/1706.03912

laibert OP Jun 26, 2017

Thanks! Haven't seen that paper, I'll check it out. I think quantization only helps with inference speed if the network is running on CPU with negligible gains on GPU (Tensorflow only supported CPU on mobile last I looked which was a while ago). However your app is already super fast so don't I think anyone would notice if it was marginally faster at this point!

This item has no comments currently.