[visionlist] Scientific Integrity and the History of Deep Learning: The 2021 Turing Lecture, and the 2018 Turing Award

Fri Jul 1 02:33:37 -04 2022

Hi, fellow computer vision and deep learning enthusiasts!

Following the great success of massive open online peer review (MOOR) for my 2015 survey of deep learning (now the most cited article ever published in the journal Neural Networks), last year I decided to put forward another piece for MOOR. I want to thank the many experts (especially those on the connectionists mailing list) who have already provided me with comments. Please send additional relevant references and suggestions for improvements for the following report directly to me at juergen at idsia.ch:

https://people.idsia.ch/~juergen/scientific-integrity-turing-award-deep-learning.html 

The above is a point-for-point critique of factual errors in the ACM's justification of the ACM A. M. Turing Award for deep learning and a critique of the Turing Lecture published by the ACM in July 2021. This work can also be seen as a short history of the deep learning revolution, at least as far as ACM's errors and the Turing Lecture are concerned.

I know that some view this as a controversial topic. However, it is the very nature of science to resolve controversies through facts. Credit assignment is as core to scientific history as it is to machine learning. My aim is to ensure that the true history of our field is preserved for posterity.

The latest version v3 mentions a few things that some don't know yet:

★ 1920s: The non-learning recurrent architecture of Lenz and Ising. Later reused in Amari’s learning recurrent neural network (RNN) of 1972. After 1982, this was sometimes called the "Hopfield network."

★ ~1960: Rosenblatt’s MLP with non-learning randomized weights in a hidden layer, and an adaptive output layer. This was much later rebranded as “Extreme Learning Machines." 

★ 1965: First functional deep learning in deep multilayer networks (Ivakhnenko & Lapa in Ukraine).

★ 1967: Amari’s stochastic gradient descent for deep neural nets (1967). The implementation with his student Saito learned internal representations in MLPs at a time when compute was billions of times more expensive than today. 

★ 1969: Fukushima’s rectified linear unit (ReLU).

★ 1970: Linnainmaa's backpropagation or reverse mode of automatic differentiation. (1960: Kelley's precursor.)

★ 1979: Fukushima’s convolutional neural net architecture.

★ 1990: Hanson's stochastic delta rule, much later called "dropout."

★ 1990: Generative adversarial networks for implementing "artificial curiosity."

★ 1991: Unsupervised pre-training for deep learning. Also "distilling one neural net into another."

★ 1991: Linear Transformers (or fast weight programmers), using outer products of self-invented (key, value) pairs for "attention" (but no softmax like in modern transformers).

★ 2010: Deep feedforward nets on GPUs greatly outperform previous methods, without any unsupervised pre-training.

★ 2011: DanNet, which won 4 computer vision contests before AlexNet and VGG Net, with superhuman performance in 2011.

★ 2015: ResNet, an open-gated version of the first extremely deep net called Highway Net published half a year earlier.

★ Many other things, including the origins of modern speech recognition, probabilistic language models, and graph neural networks.

Thank you all in advance for your contributions to this important topic!

Jürgen Schmidhuber