Preferences

The default for `convert.py` is F32. This is just SIMD CPU comparison.

Jlama uses the vector api in java20 but also better thread scheduling with work stealing and zero allocation.


Could you link to some of the examples in your repo where you enforce the zero allocation? I don't see much reuse of the buffers, eg float buffers and there is quite a lot of array based heap allocation. Just for my own interest. Many thanks. Cool to see the use of the new vector api also.
Very interesting, I'll watch for the quantized version.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal