## Deep Learning Past Present and Future – A Review

Deep learning is making a big impact in many areas of human life for solving complex problems. Deep learning models share various properties and the learning dynamics of neurons in human brain. Deep learning is an umbrella term. It covers many areas of artificial intelligence. In our Compassionate AI Lab, Deep learning is now used in medical health care, radiology analysis, old age care, new drug discovery to combat antibiotic resistant bacteria and many other projects. As the scope of AI is expanding from general intelligence to the areas of emotional intelligence, spiritual intelligence, political intelligence, bodily intelligence etc., the scope of deep learning is also expanding rapidly.

Earlier, we have discussed the limitations of deep learning algorithms. In this article, we highlight the past present and future of deep learning methods and approaches. We covered both deep reinforcement learning and deep supervised and unsupervised learning techniques in both technical and historical perspectives.

## What is Deep Learning?

The term deep learning refer to a collection of algorithms and techniques that can learn from data and the environment through multiple layers of representation and abstraction. These algorithms have recently shown impressive results across variety of application areas. Deep learning is mainly a class of machine learning techniques based on artificial neural networks. Deep learning models are roughly inspired by information processing and communication patterns in biological nervous systems of human brain. It tries to implement the deeper layers of neural networks.

Deep learning methods have revolutionized image classification and speech recognition due to their flexibility and high accuracy. More recently, deep learning algorithms have shown promise in fields as diverse as drug discovery, high-energy physics, computational chemistry, dermatology, and translation among written languages. Recently, advancement of computing power with GPUs and cloud TPUs, and the availability of very large training data sets, have enabled deep learning algorithms to surpass other machine learning algorithms for many problems. Moreover, deep learning shows increased flexibility over other machine learning approaches.

Deep Learning (DL) algorithms use many layers to process data. The first layer in a network is called the input layer, while the last is called an output layer. All the layers between the two of them are referred as hidden layers. Each layer is typically a simple, uniform algorithm containing one kind of activation function. In deep learning architecture there are many hidden layers. Each layer essentially performs feature construction for the layers before it. The training process used often allows layers deeper in the network to contribute to the refinement of earlier layers.

## Classification of Deep Learning Methods and Approaches

The term “deep” generally refers to the number of hidden layers in the neural network architecture. Traditional shallow neural networks contain about 2-3 hidden layers, while deep learning networks can have hundreds or more hidden layers. Shallow learning often reaches the plateau at a certain level of performance when you add more examples and training data to the network. But deep learning provides higher levels of accuracy and performance.

The shallow learning models such as support vector machines (SVMs) and logistic regression are dependent on feature learning. Shallow learning process is important, but very time consuming and difficult to do. Deep learning techniques are one the best solutions to deal with high dimension data and extract discriminative information from the data. Deep learning algorithms have the capability of automating feature extraction from the data.

There are several ways to classify deep machine learning techniques, however the most commonly used ones are supervised learning, unsupervised learning and reinforcement learning. Deep learning algorithms can also be classified based on their frameworks and algorithms.

Popular deep learning algorithms includes: Multi-layer perceptron (MLP), Deep Convolutional Neural Networks (CNN), Deep Residual Networks, Capsule Networks, Recurrent Neural Networks, Long Short Term Memory (LSTM) Networks, Deep Autoencoders, Deep Neural SVM, Boltzmann Machines (BM) and Restricted Boltzmann Machines (RBM), Deep Belief Networks, and Recurrent Support Vector Machines.

## Deep Reinforcement Learning

Reinforcement Learning is another branch of machine learning in which an agent learns from interacting with an environment. Reinforcement allows an agent to learn from trial and error. The learning agent receives a reward by acting in the environment and its goal is learning to select the actions that maximize the expected cumulative reward over time. One key concept of reinforcement learning is the Markov property, i.e., only the current state affects the next state, or in other words, the future is conditionally independent of the past, given the present state.

There are two types of reinforcement learning approaches for solving problems. One is trial-and-error approach and the other one is systematic model building and planning approach. Trial-and-error kind of approach is known as model free approach and planning approach is considered as model based approach, but in practical applications they are not completely independent. In reinforcement learning an “agent” interacts with an “environment”. The agent’s “policy”, i. e. the choice of actions in response to the environment’s reward, is updated to increase some reward. The basic structure of reinforcement learning is shown below.

The most well-known forms of reinforcement learning algorithms are: Temporal Difference (TD) learning, Actor-Critic, and Q-Learning. The Q-learning functions are used to create and update Q-tables. Deep Reinforcement Learning (DRL) uses of Deep Feedforward Neural Netowrk (FNN) and Recurrent Neural Network (RNN). DQN often uses large deep CNNs for better representation learning. In Deep Q-Learning (DQN), artificial neural network architectures are used for learning. Depending on applications and deep reinforcement learning techniques such as Q-Learning, TD Learning, Partially Observable MDP (POMDP) learning, Actor-Critic Methods of learning, Double DQN, DDQN, Neural Fitted Q Learning, Deep Recurrent Q Network (DQRN) and A3C ALgorithms are used in layered structure as shown in the figure below.

## Deep Learning Concepts and Developments in Historical Perspective

The main concepts of deep learning is not new, it is about 60 years old. With the development of large data sets, huge computing power and new algorithms, the true power of the concepts are now revealed. As time progress, deep learning adapted to learn hierarchies of more and more abstract data representations. Here, briefly we will walk through the history of deep learning.

In 1943, in the paper, A logical calculus of the ideas immanent in nervous activity, Warren McCulloch, a neuroscientist and his co-author Walter Pitts, a mathematician, proposed a model of artificial neural networks in which each neuron was postulated as being in binary state, that is, in either on or off condition and that is the beginning of the journey deep learning.

In 1950, Alan Turing, in his seminal paper, Computing Machinery and Intelligence, formulated the Turing Test – the test of a machine’s ability to exhibit intelligent behavior equivalent human. He also raised the question “Can machines think?” His Turing test was a significant, characteristically provocative, and lasting contribution to the debate regarding artificial intelligence.

In 1957, psychologist Frank Rosenblatt, in the paper the perceptron–a perceiving and recognizing automaton proposed the concept of “perceptron” that can learn from a set of input data similar to how biological neurons learn from stimuli. The perceptron became the ﬁrst model that could learn by updating the connection weights.

The basic building block of a perceptron is an element that accepts a number of inputs, and computes a weighted sum of these inputs where, for each input, its fixed weight, can be only + 1 or – 1. The sum is then compared with a threshold, and an output is produced that is either 0 or 1, depending on whether or not the sum exceeds the threshold.

In 1959, neurophysiologists Hubel and Wiese discovered the hierarchical processing of information in the brain. They studied the cat’s visual system and reported in the paper Receptive fields of single neurones in the cat’s striate cortex. For this work they were awarded Nobel Prize. They observed two types of cells; simple cells and complex cells. Simple cells detects local features.

They observed the brain’s hierarchical way of processing information. They also observed that at each level of processing the brain extracts more general features performed by complex cells, that aggregate the features extracted at the previous level to, and at the end of this process, recognize some objects in the input image. At the first level, the brain focuses on recognizing specific simple patterns in the input images, such as vertical or horizontal elements present in input images, which are extracted by simple cells. Hubel and Wiesel were thus originators of the key ideas leading to development of deep hierarchical neural network processing.

In 1980, Kunihiko Fukushim proposed Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, which is a deep CNN (convolutional neural network) and acquires the ability to recognize visual patterns through learning. Fukushima, however, did not set the weights by supervised backpropagation. Fukushima’s contributions laid the groundwork for multilayered convolutional neural network (CNN). CNN were inspired by biological processes in that the connectivity pattern between neurons resembles the organization of the animal visual cortex.

In a 1986 seminal paper entitled “Learning Representations by Back-propagating Errors,” Rumelhart, Hinton, and Williams described in greater detail the process of backpropagation. The backpropagation algorithm was originally introduced in the 1970s, but its importance wasn’t fully appreciated until this famous article. To speed up backpropagation, they introduced the concept of momentum.

In 1989, Yann LeCun, in paper Backpropagation applied to handwritten zip code recognition, combined convolutional neural networks with the backpropagation theories.

In 1997, Schmidhuber and Hochreiter proposed, Long short-term memory (LSTM) is a deep learning system that avoids the vanishing gradient problem. These are a special kind of RNN.

In 1998, Yann LeCun developed, LeNet-5, a pioneering 7-level convolutional neural network, uses Gradient-based learning applied to document recognition. Deep convolution neural networks then become the mainstream method for solving various applications, such as image classification, object detection, semantic segmentation, image retrieval, tracking, text detection, drug discovery, stereo matching, and many other applications.

In 2006, Geoffrey Hinton co-authored an article Reducing the dimensionality of data with neural networks, in which they had shown that high-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors.

Again, Geoffrey Hinton co-authored another paper in 2006 titled “A Fast Learning Algorithm for Deep Belief Nets” in which they describe an approach to training “deep” (as in a many layered network) of Restricted Boltzmann machines(RBMs), where they showed that instead of initializing the neuron weights randomly, neural networks with many layers could be trained well, if the weights are initialized in a clever way. The idea was to train each layer with a Restricted Boltzmann Machine.

### ImageNet and its Influence

In 2009, Fei-Fei Li, an AI professor at Stanford launched ImageNet, assembled a free database of more than 14 million labeled images. The Internet is, and was, full of unlabeled images. Labeled images were needed to “train” neural nets. Professor Li said, “Our vision was that Big Data would change the way machine learning works. Data drives learning.”

In 2009, Nvidia was involved in what was called the “big bang” of deep learning, “as deep-learning neural networks were trained with Nvidia graphics processing units (GPUs).” In this year, Google Brain used Nvidia GPUs to create capable deep learning, while there, Andrew Ng determined that GPUs could increase the speed of deep-learning systems by about 100 times.

Starting from 2010, as part of the Pascal Visual Object Challenge, an annual competition called the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) has been held.

Between 2011 and 2012, Alex Krizhevsky won several international machine and deep learning competitions with his creation AlexNet, a convolutional neural network. AlexNet trained on 15 million images. This paper, titled ImageNet Classification with Deep Convolutional Neural Networks is widely regarded as one of the most influential publications in the field.

In 2013, ZF Net was the winner of the competition. In the paper titled “Visualizing and Understanding Convolutional Neural Networks”, Zeiler and Fergus discussed how CNN perform so well and how they can be improved.

### Regional Proposal Networks (R-CNN)

2014 to 2015: The three papers Rich feature hierarchies for accurate object detection and semantic segmentation (2013), Fast R-CNN (2015) and Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2015) worked to make the model faster and better suited for modern object detection tasks.

### Combining CNN and RNN

In 2015, Karpathy and Fei-Fei Li in the article Deep Visual-Semantic Alignments for Generating Image Descriptions combines CNN and RNN to generate natural language descriptions of images and their regions. Wang et al., suggested CNN-RNN: A Unified Framework for Multi-label Image Classification.

### Combining Deep Learning and Reinforcement Learning

In 2015 Google’s DeepMind has had a lot of success. The paper Human-level control through deep reinforcement learning describes a DeepRL system which combines Deep Neural Networks with Reinforcement Learning at scale for the first time, and is able to master a diverse range of Atari 2600 games to superhuman level with only the raw pixels and score as inputs. Here, they used a form of reinforcement learning known as Q-learning to teach systems to play a set of 49 vintage video games, learning how to increase the game score as a numerical reward.

In 2016, Google DeepMind’s algorithm AlphaGo masters the art of the complex board game Go and beats the professional go player Lee Sedol at a highly publicized tournament in Seoul. The algorithms used by AlphaGo (Deep Learning, Monte Carlo Tree Search, and convolutional neural nets). The article Mastering the game of Go with deep neural networks and tree search describes how AlphaGo combines CNN, Monte Carlo simulation with value and policy networks.

## Present Status of Deep Learning

Hybrid Learning Models and the quantum deep learning are in the forefront of current deep learning research and applications. Born Again Neural Networks outperform their teachers significantly, both on computer vision and language modeling tasks.

However, our research on deep learning mostly focused on Compassionate Artificial Intelligence, explanation based deep learning and quantum approach to deep learning. The quantum approach to deep learning are two types: One is quantum extension of the traditional deep learning methods and another is altogether new methods suitable for quantum computing.

## Future Directions for Deep Learning

Automated Machine Learning. Competing Learning Models, Hybrid Learning Models, Explainable Artificial Intelligence, and Quantum Artificial Intelligence are the trends of the future deep learning research. Incorporating emotional intelligence, spiritual intelligence, political intelligence and bodily intelligence in AI systems are part of the future deep learning research.

Today, most machine learning discovery is still based on traditional approaches, i.e. at first you put up a theory, do some practical tests and make a new theory based on the results of the experiment and so it goes on and on before getting a breakthrough. Historically, this way of doing research has worked fine, but the pace has to speed up. Deep learning will take different paths for development. Deep learning with Quantum computing technology has tremendous potential to change and build a deep compassionate human society. It will change dynamics of the society starting from child care, old age care, relationship management to military affairs, commerce, and maintaining strategic balance of power.

## Summary:

We highlighted the past present and future of deep learning methods and approaches. Deep learning research is continually evolving in a complex way. Deep learning applications are growing exponentially in many industries for drug discovery, navigation, financial analysis, planning and forecasting, remote sensing, etc. We covered both deep reinforcement learning and common deep neural network learning techniques in both technical and historical perspective.

## Source Books:

- Compassionate Artificial Intelligence by Dr. Amit Ray, 2018
- Compassionate SuperIntelligence, By Dr. Amit Ray, 2018
- Quantum Computing Algorithms for Artificial Intelligence by Dr. Amit Ray, (in Press).

## References:

- F. Rosenblatt (1957), “The Perceptron: A Perceiving and Recognizing Automaton,” Report 85-60-1, Cornell Aeronautical Laboratory, Buffalo, New York.
- A. M. Turing (1950), Computing Machinery and Intelligence. Mind 49: 433-460.
- Learning representations by back-propagating errors. Rumelhart, D. E.; Hinton, G. E.; and Williams, R. J. Nature, 323: 533–536. 1986.
- Arulkumaran, A Brief Survey of Deep Reinforcement Learning, 2017
- Schmidhuber, J. (1990). Learning algorithms for networks with internal and external feedback. In Touretzky, D. S., Elman, J. L., Sejnowski, T. J., and Hinton, G. E., editors, Proc. of the 1990 Connectionist Models Summer School, pages 52–61. Morgan Kaufmann.
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner.
*Gradient-Based Learning Applied to Document Recognition.*Proceedings of the IEEE, November 1998. - Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. “Deep sparse rectifier neural networks.” International Conference on Artificial Intelligence and Statistics. 2011.
- Hinton, Geoffrey E., et al. “Improving neural networks by preventing co-adaptation of feature detectors.” arXiv preprint arXiv:1207.0580 (2012).
- Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems. 2012.
- Schmidhuber, Deep Learning in Neural Networks: An Overview, 2014