Using different modalities (like images, videos, voice/sounds instead of pure text) is interesting as well as it helps completing the meaning, adds sense of time etc.
I don't think we're born with any concepts at all, it's all quite chaotic initially with consistent sensory inputs that we use to train/stabilise our neural network. Newborns for example don't even have concept of separation between "me and the environment around me", it's learned.
That is exactly the thing that doesn't seem to be true, or at least it is considered outdated in neuroscience. We very much have some concepts that are inert, and all other concept we learned in relation to the things that are already there in our brains - at birth mostly sensorymotor stuff. We decidedly don't learn new concepts from scratch, only in relation to already acquired concepts.
So our brains work quite a bit different than LLMs, despite the neuron metaphor used there.
And regarding your food example, the difference I was trying to point out: For LLMs, the word and the concept, are the same thing. For humans they are different things that are also learned differently. The memorization part (mostly) only affects the word, not the concept behind it. What you described was only the learning of the word "tall" - the child in your example already knew that the other person was taller than them, it just didn't know how to talk about that.
Just nitpicking here, but this isn't how humans learn numbers. They start at birth with competency up to about 3 or 5 and expand from that. So they can already work with quantities of varying size (i.e. they know which is more, the 4 apples on the left or the five on the right, and they also know what happens if I take one apple from the left and put it to the others on the right), and then they learn the numbers. So yes, they learn the numbers through memorization, but only the signs/symbols, not the numeric competency itself.