Preferences

It seems like verification might need to be improved a bit? I looked at Mistral-Large-123B. Someone is claiming 12 tokens/sec on a single RTX 3090 at FP16.

Perhaps some filter could cut out submissions that don't really make sense?


Great idea - took a bit to figure out how to implement this.

I came up with a plausibility check based on the model's memory requirements: https://github.com/BinSquare/inferbench/blob/main/src/lib/pl...

So now on the submission page - it has a warning + an automate flag count for volunteers to double check:

```This configuration seems unlikely

Model requires ~906GB VRAM but only 32GB available (28.3x over). This likely requires significant CPU offload which would severely impact performance.

You can still submit, but your result will be flagged for review.```

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal