Preferences

In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.

"What are you doing?", asked Minsky.

"I am training a randomly wired neural net to play Tic-tac-toe", Sussman replied.

"Why is the net wired randomly?", asked Minsky.

"I do not want it to have any preconceptions of how to play", Sussman said.

Minsky then shut his eyes.

"Why do you close your eyes?" Sussman asked his teacher.

"So that the room will be empty."

At that moment, Sussman was enlightened.

RIP.


EdwardCoffin
Danny Hillis: "The first time I met Marvin Minsky: I walked into his office, I was very intimidated. He was sitting there, he was throwing wadded up pieces of paper at a wastebasket across the room, and doing a terrible job of it. He's missing it, they're all falling short. So I watch this for a while. Then he looked up at me and said 'ah! I forgot! It's one half em gee squared!'"

- From his talk On Game Software Development, in 2001 I think (from Technetcast.com)

josteink
Maybe I'm wrong, but that sounds like he was trying to get the formula for kinetic energy, which would be half em vee squared.

Just to be clear: I'm not questioning the story, just the details in the recollection :)

EdwardCoffin
I did wonder at the discrepancy at the time, so I re-listened to the audio several times carefully when I was writing the transcript above. To me, he is clearly saying gee rather than vee (could be an artifact of the recording though).

I'm sure Hillis knows more physics than I do (though I knew the equations of motion pretty well at one time), but he could easily have just mis-spoke. I didn't pursue this line of thought, but considered it might have been something to do with him deriving an expression for the vertical position in a gravitational field, perhaps in terms of horizontal motion or something.

Will robots inherit the earth?

Yes, but they will be our children.

--Marvin Minsky http://web.media.mit.edu/~minsky/papers/sciam.inherit.html

jes5199
Amazingly, according to this 1981 interview in the New Yorker, Minsky's first neural net was itself randomly wired!

http://www.newyorker.com/magazine/1981/12/14/a-i

"Because of the random wiring, it had a sort of fail-safe characteristic. If one of the neurons wasn’t working, it wouldn’t make much of a difference—and, with nearly three hundred tubes and the thousands of connections we had soldered, there would usually be something wrong somewhere. In those days, even a radio set with twenty tubes tended to fail a lot. I don’t think we ever debugged our machine completely, but that didn’t matter. By having this crazy random design, it was almost sure to work, no matter how you built it."

bijection
tarr11
Found this original version copied from this story (original source is a dead link) [1]

"So Sussman began working on a program. Not long after, this odd-looking bald guy came over. Sussman figured the guy was going to boot him out, but instead the man sat down, asking, “Hey, what are you doing?” Sussman talked over his program with the man, Marvin Minsky. At one point in the discussion, Sussman told Minsky that he was using a certain randomizing technique in his program because he didn’t want the machine to have any preconceived notions. Minsky said, “Well, it has them, it’s just that you don’t know what they are.” It was the most profound thing Gerry Sussman had ever heard. And Minsky continued, telling him that the world is built a certain way, and the most important thing we can do with the world is avoid randomness, and figure out ways by which things can be planned. Wisdom like this has its effect on seventeen-year-old freshmen, and from then on Sussman was hooked.]"

[1] http://spetharrific.tumblr.com/post/26600309788/sussman-atta...

dandrews
That sounds like the version told in Levy's Hackers. Maybe someone in HN-land can verify that; I can't find my copy.
Someone posted this elsewhere in the thread:

https://web.archive.org/web/20120717041345/http://sch57.msk....

Verified. Chapter 6, page 117 in my paperback edition.
I had no idea that koan was a (mostly) true story. Thanks a lot for posting.
kkylin
Yes, I had once asked GJS about that exactly that story, and he confirmed it (though maybe not the exact words).
georgespencer
I always loved this TED talk of his

https://www.youtube.com/watch?v=RYsTv-ap3XQ

peter303
This is not the Prof Minsky I remember from 1970s where I was a MIT student. His mind was laser sharp then and could drill into the core of any problem.
copperx
It doesn't look to me like he's unfocused or forgetful on the video. What are you referring to exactly?
It is now several decades later.

Do TensorFlow/CNN builders use random initial configurations, or custom designed stuctures?

nabla9
There is another deeper meaning in this koan.

It's related to the No Free Lunch Theorems. It basically says that if an algorithm performs well on a certain class of learning, searching or optimization problems, then it necessarily pays for that with degraded performance on the set of all remaining problems.

In other words, you always need bias to learn meaningfully. More you have (the right kind of) bias, faster you can learn the subject in hand and slower in all other kinds. In neural networks the bias is not just the weights. There is bias in the selection of random distribution of the network weights (uniform, Gaussian etc.) There is bias in the network topology. There is bias in the learning algorithm, activation function, etc.

Convolutional neural networks are good example. They have very strong bias baked into them and it works really well.

argonaut
Usually random drawn from a certain distribution (ex: Gaussian with std. deviation 0.001, or a std. deviation dependent on the number of input/output units (Xavier initialization)).

For some tasks, you may wish to initialize using a network that was already trained on a different dataset, if you have reason to believe the new training task is similar to the previous task.

Houshalter
NN weights need to start random because otherwise two weights with exactly the same value can get "stuck" and be unable to differentiate. Backpropagation relies on starting random patterns that kind of match so that it can fine tune them.

But the weights are often initialized to be really close to zero.

brianpgordon
Given the era though, Sussman may have actually been working with a neural net that's not the typical hidden-layer variety. "Randomly wired" could be a statement about the topography of the network, not about the weights.
argonaut
There is no evidence he was actually working with a neural net.

https://web.archive.org/web/20120717041345/http://sch57.msk....

I had no idea this existed; it's brilliant!
discardorama
If you start with the same weights, then the neurons with similar connections will learn the same things. Random initialization is what gets them started in different directions.
raverbashing
Random weights but the spacial organization of inputs follows the input geometry
ehudla
The key is what learning procedure is used. It is not clear from the story if the nets were learning, and if so how.
areyousure
Honest question: What does this koan mean? Specifically, what is closing one's eyes analogous to in the neural network? What is the recommended alternative action?
Scarblac
Somewhere else is the thread is a closer rendition of how the actual exchange went. Using that, I finally understand the koan:

Closing his eyes did not make the room empty. It made him not know which things were where.

Randomizing the neural network did not remove all the preconceptions from the network. It made him not know what the network's preconceptions were.

This item has no comments currently.