Preferences

Have you tried HDBSCAN (DBSCAN variant) or Hierarchical Clustering (HAC) ?

Me? I probably tried every classification algorithm and their H variants. I still think "What is K?" is a profound question.
I agree it is a profound question. My thesis is fairly boring.

For any given clustering task of interest, there is no single value of K.

Clustering & unsupervised machine learning is as much about creating meaning and structure as it is about discovering or revealing it.

Take the case of biological taxonomy, what K will best segment the animal kingdom?

There is no true value of K. If your answer is for a child, maybe it’ 7 corresponding to what we’re taught in school - mammals, birds, reptiles, amphibians, fish, and invertebrates.

If your answer is for a zoologist, obviously this won’t do.

Every clustering task of interest is like this. And I say of interest because clustering things like digits in the classic MNIST dataset is better posed as a classification problem - the categories are defined analytically.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal