Machine Learning to Fight Antimicrobial Resistance

Antimicrobial Resistance (AMR) occurs when bacteria, viruses, fungi and parasites change over time and no longer respond to medicine. Seven key machine learning projects to fight against antimicrobial resistance are discussed. Artificial Intelligence (AI) is also used to hunt the new resistance genes, for building better understanding of how bacteria fight off drug treatments.

Antimicrobial resistance is one of the key reasons for human sufferings in modern hospitals. The objective of our Compassionate AI Lab research is to eliminate the pain and sufferings of the humanity with the use of emerging technologies. Overcoming the challenges of antimicrobial resistance is one of the key research area of our Compassionate AI Lab.

Preventing microbes from developing resistance to drugs has become as important issue for treating illnesses across the world. Artificial Intelligence, machine learning, genomics and multi-omics data integration are the fast-growing emerging technologies to counter antimicrobial resistance problems. Here, Dr. Amit Ray explains how these technologies can be used in seven key areas to counter antimicrobial resistance issues.

AI and Machine Learning for Antimicrobial Resistance

Researchers are using machine learning to identify patterns within high volume genetic data sets. High-throughput sequencing technology has resulted in generation of an increasing amount of microbial data. Machine learning techniques are used to achieve greater depth in the interpretation of genetic information such as how an microbial genes may impact drugs, immunity, and resilience.

The ability of machine learning algorithms to handle multi-dimensional big data has given tremendous power. Moreover, the availability of cloud based hardware with graphics processing units (GPUs) has given tremendous fast computational speed and compatibility with various algorithms. The article explained, how the innovative technologies such as AI & machine learning techniques are used with common sequencing technologies like DNA-seq, ChIP-Seq, RNA-Seq, 16S metagenomics, and small RNA analyses to fight against antimicrobial resistance.

Earlier I have explained the Artificial Intelligence to Combat Antibiotic Resistant Bacteria. There I discussed the common framework for multi-agent deep reinforcement learning models for predicting behavior of bacteria and phages in multi-drug environments. Here, I discussed seven top machine learning projects for antimicrobial resistance. These seven projects are overlapping in nature and used many common tools.

Antibiotic and Antimicrobial Resistance Basics

Understanding the difference between antibiotic and antimicrobial resistance is important. Microbes are living organisms that multiply frequently and spread rapidly. They include bacteria, viruses, fungi and parasites. Antimicrobial resistance (AMR) occurs when microbes such as bacteria, viruses, fungi and parasites change in ways that render the medications used to cure the infections they cause ineffective. Antimicrobial resistance is the broader term for resistance in different types of microorganisms and encompasses resistance to antibacterial, antiviral, antiparasitic and antifungal drugs.

Antibiotic resistant bacteria are bacteria that are not controlled or killed by antibiotics. They are able to survive and even multiply in the presence of an antibiotic. These bacteria currently kill an estimated 700,000 people globally each year – a death toll which could rise to 10 million a year by 2050 if we don’t act [1]. The main difficulty is that the bacteria are changing fast. They changing faster than we can change the drugs in response.

Antimicrobial resistance (AMR or AR) is the ability of a microbe to resist the effects of medication that once could successfully treat the microbe. The term antibiotic resistance (AR or ABR) is a subset of AMR, as it applies only to bacteria becoming resistant to antibiotics. The ability of bacteria and other microorganisms to resist the effects of an antibiotic to which they were once sensitive.

How Microbes Develop Resistance to Drugs

According to the World Health Organization, antimicrobial resistance occurs mainly due to inappropriate use of medicines, for example using antibiotics for viral infections such as cold or flu, or sharing antibiotics. Low-quality medicines, wrong prescriptions and poor infection prevention and control also encourage the development and spread of drug resistance. Lack of government commitment to address these issues, poor surveillance and a diminishing arsenal of tools to diagnose, treat and prevent also hinder the control of antimicrobial drug resistance.

Impact of Antimicrobial Resistance

According to the NIH: National Institute of Allergy and Infectious Diseases, antimicrobial resistance makes it harder to eliminate infections from the body as existing drugs become less effective. As a result, some infectious diseases are now more difficult to treat than they were just a few decades ago. As more microbes become resistant to antimicrobials, the protective value of these medicines is reduced. Overuse and misuse of antimicrobial medicines are among the factors that have contributed to the development of drug-resistant microbes. AMR can lead to the following key issues:

Many infections being harder to control and staying longer inside the body
Longer hospital stays, increasing the economic and social costs of infection
Higher risk of disease spreading
Greater chance of fatality due to infections

Machine Learning Techniques for Antimicrobial Resistance

Machine learning techniques can learn from data, without requiring explicit, programmatic instruction. They find patterns from data without the constraints of formulas or even human theories. By learning directly from data, machine-learning techniques can often achieve accuracies not possible with more conventional approaches. One exciting and promising approach now being applied in the antimicrobial field is deep learning, a variation of machine learning that uses neural networks to automatically extract novel features from input data. The availability of vast data of various types antibiotic resistance microbes ensures that there are enough training datasets to build accurate prediction models relating to microbe behaviors and gene expressions. Primarily, there are four approaches to machine learning: supervised learning, unsupervised learning, reinforcement learning and deep learning.

Machine Learning Types

In supervised learning, all data is labeled and the algorithms learn to predict the output from the input data. The supervised learning algorithms includes models like; logistic regression, random forest classification, a boosted decision tree classifier, support vector classification with radial basis function kernel, K-nearest neighbors classification and neural networks. Gradient Boosting is another technique for performing supervised machine learning tasks. XGBoost is particularly popular because it has been the winning algorithm in several recent competitions.

In unsupervised learning models all data is unlabeled and the algorithms learn to inherent structure from the input data. The algorithms discovers relationships between different features on its own. Clustering and dimensionality reduction are the two main uses of unsupervised learning. Principal component analysis (PCA), singular value decomposition (SVD) and k-means clustering are the key algorithms for unsupervised learning models. As microbes have higher data dimensions, so feature dimensionality reduction is also an important part of data processing.

Reinforcement learning is a subset of machine learning, that expands on Markov-decision processes (MDPs), by embedding reward-based feedback into decision outcomes so that an optimal decision approach, termed the policy, can be identified. The vital power of reinforcement learning is that it allows the computer to learn its own approach to obtain a given reward, rather than relying on human behavior as the gold standard.

Deep learning is a subset of machine learning algorithms that allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These are mostly hybrid models of many hidden layers.

Deep Learning Algorithms

Popular deep learning algorithms includes: Multi-layer perceptron (MLP), Deep Convolutional Neural Networks (CNN), Deep Residual Networks, Capsule Networks, Recurrent Neural Networks, Long Short Term Memory (LSTM) Networks, Deep Autoencoders, Deep Neural SVM, Boltzmann Machines (BM) and Restricted Boltzmann Machines (RBM), Deep Belief Networks, and Recurrent Support Vector Machines.

Recently, the deep convolutional neural networks (DCNNs) have shown astounding performance in object recognition, classification and detection. The basic components of a CNN are stacks of different types of hidden layers (convolutional, activation, pooling, fully-connected, softmax, etc.) that are interconnected and whose weights are trained using the backpropagation algorithm [7]. The training phase of a CNN requires huge numbers of labelled data to avoid the problem of over-fitting; however, once trained, CNNs are capable of producing accurate and generalizable models that achieve state-of-the-art performance in general pattern recognition tasks. Some examples of Deep CNN include AlexNet, GoogLeNet, VGGNet, and ResNet.

Deep Reinforcement Learning (DRL)

The most well-known forms of reinforcement learning algorithms are: Temporal Difference (TD) learning, Actor-Critic, and Q-Learning. The Q-learning functions are used to create and update Q-tables. Deep Reinforcement Learning (DRL) uses of Deep Feedforward Neural Netowrk (FNN) and Recurrent Neural Network (RNN). DQN often uses large deep CNNs for better representation learning. In Deep Q-Learning (DQN), artificial neural network architectures are used for learning. Depending on applications and deep reinforcement learning techniques such as Q-Learning, TD Learning, Partially Observable MDP (POMDP) learning, Actor-Critic Methods of learning, Double DQN, DDQN, Neural Fitted Q Learning, Deep Recurrent Q Network (DQRN) and A3C Algorithms are used in layered structure.

Seven Key Machine Learning Projects for Antimicrobial Resistance

The machine learning application for combating antibiotic resistance bacteria is a new field and growing rapidly. With the development of microbial sequencing in recent years, the microbiome has become increasingly popular in many studies. We have identified seven key projects for antimicrobial resistance. They are; machine learning for antibiotic resistance genes detection, machine learning for classification of microbial communities, machine learning for molecular basis for bacterial resistance, machine learning for multi-drug resistance behavior of bacteria, machine learning for behavior of bacteria with phages, machine learning for MRSA Strains for Hospital-Acquired Infections, machine learning for culture-free identification of bacteria.

In these models, generally, for the bacterial growth model, we assume that the total bacterial population is comprised of drug-susceptible growing cells and drug-insensitive resting cells. The antibacterial effect of the drug is included in the killing rate of the bacteria. By training machine learning classifiers on information about the presence or absence of genes, their sequence variation, and gene expression profiles, we generated predictive models and identified biomarkers of susceptibility or resistance to commonly administered antimicrobial drugs.

Machine Learning to Fight Antimicrobial Resistance

The human microbiota consists of about 100 trillion microbial cells, compared with our 10 trillion human cells, and these microbial symbionts contribute many traits to human immunity and biology. Compositional differences between microbial communities residing in various body sites are large, and comparable in size to the differences observed in microbial communities from disparate physical habitats.

In traditional “phenotypic” testing, bacteria are grown in the presence of different concentrations of various antibiotics. Bacteria that do not grow in the presence of a test antibiotic are called ‘susceptible’ and those that do grow are called ‘resistant.’ Today, however, the same information can be generated through newer technology called ‘whole genome sequencing’ (WGS).

With the advent of affordable whole-genome sequencing (WGS) technology, it is now possible to determine and evaluate the entire DNA sequence of a bacterium. By providing definitive genotype information, WGS offers the highest practical resolution for characterizing an individual microbe. Bacteria that have identical resistance patterns caused by different mechanisms can also be differentiated by WGS. Emerging tools such as metagenomics are focusing on applying sequencing to the sample itself, eliminating the need to isolate and sequence bacteria from the sample. Researchers have used WGS analysis extensively to understand the molecular basis and host-ecosystem relationships in infectious diseases and microbiology. This has allowed advances in our understanding of epidemiology, pathogen evolution and virulence determinants to better conduct disease outbreak investigation and assess disease transmission networks.

In the background of WGS and metagenomics, the summary of the seven machine learning applications to antimicrobial resistance projects are discussed below:

1. Antibiotic Resistance Genes Detection

Whole-genome sequence (WGS) of microorganisms has become an important tool for antibiotics resistance screening and, thus, provides rapid identification of antibiotic resistance mechanisms. Sequencing the entire genome is found to be helpful in many antimicrobial applications such as new antibiotics development, diagnostic tests, the management of presently available antibiotics, and clarifying the factors promoting the emergence and resistance of pathogenic bacteria. We have tested DCNNs, Restricted Boltzmann Machines (RBM), Deep Belief Networks algorithms for antibiotic resistance genes detection and the results are impressive.

2. Classification of Microbial Communities

Based on supervised learning techniques microbes can be grouped into various classes based on correlations. Three popular techniques for microbial community classification include convolutional neural networks (CNN), genetic programming (GP), random forests (RF), and logistic regression (LR). Random forests classifier is one of the top performers in microarray analysis. RFs are an extension of bagging, or bootstrap aggregating, in which the final predictions of the model are based on an ensemble of weak predictors trained on bootstrapped samples of the data.

3. Molecular Basis for Bacterial Resistance

Machine learning techniques are used to predict environmental and host phenotypes using microorganisms. One of the major drivers of resistance spreading between bacteria are transposons. This is also called jumping DNA, where the genetic elements that can switch locations in the genome autonomously. When transferred between bacteria, transposons can carry antibiotic resistance genes within them. Use of reinforcement learning to understand the transfer mechanisms in real life is one of our important project. We have noticed, Partially Observable MDP (POMDP) learning models are best fitted for our prototypes testing data.

4. Multi-drug Resistance Behavior of Bacteria

Machine learning techniques to develop the dosing strategies for multi-drug-resistant bacteria is one of the key research area. Multi-drug-resistant bacteria requires aggressive treatment with the limited arsenal of effective, therapeutic antibiotics. Conventional antibiotic combinations have not completely addressed the challenge of treating infections caused by multi-drug-resistant bacteria, especially when incremental and ineffective antibiotic dosing strategies are employed that do not overcome individual mechanisms of resistance. A reinforcement-learning algorithm predicted which patient characteristics and dosing decisions that resulted in the lowest risk of failure to be discharged on the medication. We have noticed, Deep Recurrent Q Network (DQRN) learning algorithms are most suitable for handling multi-drug issues.

5. Behavior of Bacteria with Phages

Phage is a type of virus that infects bacteria. Phage will only kill a bacterium if it matches to the specific strain. Bacteria behave differently when they are exposed to phages. Phage-bacteria interactions are unique in that phage interactions with bacteria can range from predatory, to parasitic. Different types of interactions lead to different evolutionary scenarios for the host population, and different genomic signatures. The use of phages against pathogenic bacteria can be modeled using two different approaches, one passive, the other active. Different bacteria–phage interactions occur depending on the health status and development stage of the human host. We are using various reinforcement learning algorithms to model the bacteria-phage interactions.

6. MRSA Strain Models for Hospital-Acquired Infections

Methicillin-resistant S. aureus (MRSA) in particular has emerged as a widespread cause of both community as well as for hospital-acquired infections. Recently, MRSA was classified by the World Health Organization (WHO) as one of twelve priority pathogens that threaten human health. MRSA biofilm production, the relationship of biofilm production to antibiotic resistance, and front-line techniques to defeat the biofilm-resistance system are the three keys areas for this machine learning project.

Rapid and accurate strain typing of bacteria is vital. This would facilitate epidemiological investigation and infection control in near real time. Matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry is a reliable strain typing tool. The use of MALDI-TOF MS is currently expanding in clinical laboratories due to its rapid and accurate bacterial identification capabilities. MALDI-TOF spectral analysis has been widely used in clinical microbiology for the identification of bacteria, fungi, and mycobacteria. However, the subtle difference among the spectra is difficult to interpret correctly by visual examination. Moreover, the presence or absence of proteins, and assessment of their expression levels are difficult to discriminate. However, machine learning (ML) methods promises to resolve these issues.

Recently, we have experimented the spectra data with deep convolutional neural networks (DCNNs). The datasets were split into training (65%), validation (15%), and test (15%). Two different DCNNs, AlexNet and GoogLeNet, were used to classify the spectra data. Both untrained and pre-trained networks on ImageNet were used, and augmentation with multiple pre-processing techniques. Ensembles were performed on the best-performing algorithms. The best-performing classifier had an AUC of 0.96, which was an ensemble of the AlexNet and GoogLeNet DCNNs.

7. Machine Learning for Culture-free Identification of Bacteria

To confidently select a narrow-spectrum rather than broad-spectrum antibiotic therapy, the physician requires the pathogen's antibiotic susceptibility profile. Currently, due to the relatively long time for diagnostic results to be finalized, physicians are often left with no alternative but to employ an empirically defined broad-spectrum antibiotic therapy to secure a patient's survival. For example, bloodstream infections generally involve extremely low (e.g., <10 colony-forming unit (CFU)/mL) bacterial concentrations, which require a labor-intensive process and as much as 72 hours to yield a diagnosis.

Emerging automated rapid microbiology methods, especially those employing miniaturized microfluidic devices (or lab-on-a-chip systems) and nanotechnologies, offer unique opportunities to combat the crisis of antibiotic resistance. However, along with the allied new technologies, machine learning based culture-free identification of bacteria can reduce the time to 2 hours. We have experimented with XGBoost algorithms.

Machine Learning Workflow and Key Steps

The five key steps for the machine learning antimicrobial resistance projects are: gathering data from various sources, data cleaning and data pre-processing, model building training and testing, result evaluation and deployment, monitoring and transforming results into actions.

Machine Learning Steps and Workflow

Key Antimicrobial Resistance Databases for Machine Learning

The Integrated Microbial Genomes (IMG) data warehouse is a leading comprehensive resource devoted specifically to microbes. CARD: The Comprehensive Antibiotic Resistance Database is a rigorously curated collection of characterized, peer-reviewed resistance determinants and associated antibiotics, organized by the Antibiotic Resistance Ontology (ARO) and AMR gene detection models.

GenBank is a free, public collection of all available DNA sequences. It is served by an online platform. Among the existing features, GenBank supplies the DNA’s correspondent protein translation sequences and gives the user the possibility of downloading large sets of records at once.

The freely available PATRIC (Pathosystems Resource Integration Center), platform provides an interface for biologists to discover data and information and conduct comprehensive comparative genomics and other analyses in a one-stop shop. Similarly, PathogenPortal Hub site, EuPathDB for Eukaryotic Pathogens, IRD for Influenza Research Database, ViPR for virus Database and Analysis Resources are full with resources. Microbial Genome Database (MBGD), and the MicrobesOnline resource contains information on thousands of bacterial, archaeal, and fungal genomes. It also provides access to gene expression and fitness data.

Conclusion:

We have discussed seven machine learning projects to fight against antimicrobial resistance microbes, which causing harm and sufferings to the humanity. The objectives of these projects are to prevent microbes from developing resistance to drugs. Machine learning, genomics and multi-omics data integration are the fast-growing emerging technologies to counter antimicrobial resistance problems. While there are great promises, developing large-scale long-term strategies to counter antimicrobial resistance is still an uphill battle.

Deep learning and other machine learning techniques are increasingly showing superior performance in many areas of antimicrobial resistance, which has many complex data. However, there are many obstacles and number of issues remain unsolved for implementing large-scale integrated machine learning systems for antimicrobial resistance. We are eager to embrace machine learning methods as an established tool for antimicrobial resistance prevention and analysis, and we look forward with great anticipation to the new insights that will emerge from these applications.

Source Books:

References:

Deep Learning Past Present and Future - A Systematic Review

Deep Learning Past Present and Future - A Review

Deep learning is making a big impact in many areas of human life for solving complex problems. Deep learning models share various properties and the learning dynamics of neurons in human brain. Deep learning is an umbrella term. It covers many areas of artificial intelligence. In our Compassionate AI Lab, Deep learning is now used in medical health care, radiology analysis, old age care, new drug discovery to combat antibiotic resistant bacteria and many other projects. As the scope of AI is expanding from general intelligence to the areas of emotional intelligence, spiritual intelligence, political intelligence, bodily intelligence etc., the scope of deep learning is also expanding rapidly.

Past Present and Future of Deep Learning and Deep Reinforcement learning

Earlier, we have discussed the limitations of deep learning algorithms. In this article, we highlight the past present and future of deep learning methods and approaches. We covered both deep reinforcement learning and deep supervised and unsupervised learning techniques in both technical and historical perspectives.

What is Deep Learning?

The term deep learning refer to a collection of algorithms and techniques that can learn from data and the environment through multiple layers of representation and abstraction. These algorithms have recently shown impressive results across variety of application areas. Deep learning is mainly a class of machine learning techniques based on artificial neural networks. Deep learning models are roughly inspired by information processing and communication patterns in biological nervous systems of human brain. It tries to implement the deeper layers of neural networks.

Deep learning methods have revolutionized image classification and speech recognition due to their flexibility and high accuracy. More recently, deep learning algorithms have shown promise in fields as diverse as drug discovery, high-energy physics, computational chemistry, dermatology, and translation among written languages. Recently, advancement of computing power with GPUs and cloud TPUs, and the availability of very large training data sets, have enabled deep learning algorithms to surpass other machine learning algorithms for many problems. Moreover, deep learning shows increased flexibility over other machine learning approaches.

Deep Learning (DL) algorithms use many layers to process data. The first layer in a network is called the input layer, while the last is called an output layer. All the layers between the two of them are referred as hidden layers. Each layer is typically a simple, uniform algorithm containing one kind of activation function. In deep learning architecture there are many hidden layers. Each layer essentially performs feature construction for the layers before it. The training process used often allows layers deeper in the network to contribute to the refinement of earlier layers.

Classification of Deep Learning Methods and Approaches

The term “deep” generally refers to the number of hidden layers in the neural network architecture. Traditional shallow neural networks contain about 2-3 hidden layers, while deep learning networks can have hundreds or more hidden layers. Shallow learning often reaches the plateau at a certain level of performance when you add more examples and training data to the network. But deep learning provides higher levels of accuracy and performance.

The shallow learning models such as support vector machines (SVMs) and logistic regression are dependent on feature learning. Shallow learning process is important, but very time consuming and difficult to do. Deep learning techniques are one the best solutions to deal with high dimension data and extract discriminative information from the data. Deep learning algorithms have the capability of automating feature extraction from the data.

There are several ways to classify deep machine learning techniques, however the most commonly used ones are supervised learning, unsupervised learning and reinforcement learning. Deep learning algorithms can also be classified based on their frameworks and algorithms.

Deep Learning Algorithms Complete Review By Dr Amit Ray

Deep Reinforcement Learning

Reinforcement Learning is another branch of machine learning in which an agent learns from interacting with an environment. Reinforcement allows an agent to learn from trial and error. The learning agent receives a reward by acting in the environment and its goal is learning to select the actions that maximize the expected cumulative reward over time. One key concept of reinforcement learning is the Markov property, i.e., only the current state affects the next state, or in other words, the future is conditionally independent of the past, given the present state.

There are two types of reinforcement learning approaches for solving problems. One is trial-and-error approach and the other one is systematic model building and planning approach. Trial-and-error kind of approach is known as model free approach and planning approach is considered as model based approach, but in practical applications they are not completely independent. In reinforcement learning an “agent” interacts with an “environment”. The agent’s “policy”, i. e. the choice of actions in response to the environment’s reward, is updated to increase some reward. The basic structure of reinforcement learning is shown below.

Deep Reinforcement Learning Algorithms Review by Dr Amit Ray

Deep Learning Concepts and Developments in Historical Perspective

The main concepts of deep learning is not new, it is about 60 years old. With the development of large data sets, huge computing power and new algorithms, the true power of the concepts are now revealed. As time progress, deep learning adapted to learn hierarchies of more and more abstract data representations. Here, briefly we will walk through the history of deep learning.

In 1943, in the paper, A logical calculus of the ideas immanent in nervous activity, Warren McCulloch, a neuroscientist and his co-author Walter Pitts, a mathematician, proposed a model of artificial neural networks in which each neuron was postulated as being in binary state, that is, in either on or off condition and that is the beginning of the journey deep learning.

In 1950, Alan Turing, in his seminal paper, Computing Machinery and Intelligence, formulated the Turing Test - the test of a machine's ability to exhibit intelligent behavior equivalent human. He also raised the question "Can machines think?" His Turing test was a significant, characteristically provocative, and lasting contribution to the debate regarding artificial intelligence.

In 1957, psychologist Frank Rosenblatt, in the paper the perceptron--a perceiving and recognizing automaton proposed the concept of "perceptron" that can learn from a set of input data similar to how biological neurons learn from stimuli. The perceptron became the ﬁrst model that could learn by updating the connection weights.

The basic building block of a perceptron is an element that accepts a number of inputs, and computes a weighted sum of these inputs where, for each input, its fixed weight, can be only + 1 or - 1. The sum is then compared with a threshold, and an output is produced that is either 0 or 1, depending on whether or not the sum exceeds the threshold.

In 1959, neurophysiologists Hubel and Wiese discovered the hierarchical processing of information in the brain. They studied the cat’s visual system and reported in the paper Receptive fields of single neurones in the cat's striate cortex. For this work they were awarded Nobel Prize. They observed two types of cells; simple cells and complex cells. Simple cells detects local features.

They observed the brain’s hierarchical way of processing information. They also observed that at each level of processing the brain extracts more general features performed by complex cells, that aggregate the features extracted at the previous level to, and at the end of this process, recognize some objects in the input image. At the first level, the brain focuses on recognizing specific simple patterns in the input images, such as vertical or horizontal elements present in input images, which are extracted by simple cells. Hubel and Wiesel were thus originators of the key ideas leading to development of deep hierarchical neural network processing.

In 1980, Kunihiko Fukushim proposed Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position, which is a deep CNN (convolutional neural network) and acquires the ability to recognize visual patterns through learning. Fukushima, however, did not set the weights by supervised backpropagation. Fukushima’s contributions laid the groundwork for multilayered convolutional neural network (CNN). CNN were inspired by biological processes in that the connectivity pattern between neurons resembles the organization of the animal visual cortex.

In a 1986 seminal paper entitled “Learning Representations by Back-propagating Errors,” Rumelhart, Hinton, and Williams described in greater detail the process of backpropagation. The backpropagation algorithm was originally introduced in the 1970s, but its importance wasn't fully appreciated until this famous article. To speed up backpropagation, they introduced the concept of momentum.

In 1989, Yann LeCun, in paper Backpropagation applied to handwritten zip code recognition, combined convolutional neural networks with the backpropagation theories.

In 1997, Schmidhuber and Hochreiter proposed, Long short-term memory (LSTM) is a deep learning system that avoids the vanishing gradient problem. These are a special kind of RNN.

In 1998, Yann LeCun developed, LeNet-5, a pioneering 7-level convolutional neural network, uses Gradient-based learning applied to document recognition. Deep convolution neural networks then become the mainstream method for solving various applications, such as image classification, object detection, semantic segmentation, image retrieval, tracking, text detection, drug discovery, stereo matching, and many other applications.

In 2006, Geoffrey Hinton co-authored an article Reducing the dimensionality of data with neural networks, in which they had shown that high-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors.

Again, Geoffrey Hinton co-authored another paper in 2006 titled “A Fast Learning Algorithm for Deep Belief Nets” in which they describe an approach to training “deep” (as in a many layered network) of Restricted Boltzmann machines(RBMs), where they showed that instead of initializing the neuron weights randomly, neural networks with many layers could be trained well, if the weights are initialized in a clever way. The idea was to train each layer with a Restricted Boltzmann Machine.

ImageNet and its Influence

In 2009, Fei-Fei Li, an AI professor at Stanford launched ImageNet, assembled a free database of more than 14 million labeled images. The Internet is, and was, full of unlabeled images. Labeled images were needed to “train” neural nets. Professor Li said, “Our vision was that Big Data would change the way machine learning works. Data drives learning.”

In 2009, Nvidia was involved in what was called the “big bang” of deep learning, “as deep-learning neural networks were trained with Nvidia graphics processing units (GPUs).” In this year, Google Brain used Nvidia GPUs to create capable deep learning, while there, Andrew Ng determined that GPUs could increase the speed of deep-learning systems by about 100 times.

Starting from 2010, as part of the Pascal Visual Object Challenge, an annual competition called the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) has been held.

Between 2011 and 2012, Alex Krizhevsky won several international machine and deep learning competitions with his creation AlexNet, a convolutional neural network. AlexNet trained on 15 million images. This paper, titled ImageNet Classification with Deep Convolutional Neural Networks is widely regarded as one of the most influential publications in the field.

In 2013, ZF Net was the winner of the competition. In the paper titled “Visualizing and Understanding Convolutional Neural Networks”, Zeiler and Fergus discussed how CNN perform so well and how they can be improved.

Regional Proposal Networks (R-CNN)

2014 to 2015: The three papers Rich feature hierarchies for accurate object detection and semantic segmentation (2013), Fast R-CNN (2015) and Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2015) worked to make the model faster and better suited for modern object detection tasks.

Combining CNN and RNN

In 2015, Karpathy and Fei-Fei Li in the article Deep Visual-Semantic Alignments for Generating Image Descriptions combines CNN and RNN to generate natural language descriptions of images and their regions. Wang et al., suggested CNN-RNN: A Unified Framework for Multi-label Image Classification.

Combining Deep Learning and Reinforcement Learning

In 2015 Google's DeepMind has had a lot of success. The paper Human-level control through deep reinforcement learning describes a DeepRL system which combines Deep Neural Networks with Reinforcement Learning at scale for the first time, and is able to master a diverse range of Atari 2600 games to superhuman level with only the raw pixels and score as inputs. Here, they used a form of reinforcement learning known as Q-learning to teach systems to play a set of 49 vintage video games, learning how to increase the game score as a numerical reward.

In 2016, Google DeepMind’s algorithm AlphaGo masters the art of the complex board game Go and beats the professional go player Lee Sedol at a highly publicized tournament in Seoul. The algorithms used by AlphaGo (Deep Learning, Monte Carlo Tree Search, and convolutional neural nets). The article Mastering the game of Go with deep neural networks and tree search describes how AlphaGo combines CNN, Monte Carlo simulation with value and policy networks.

Present Status of Deep Learning

Hybrid Learning Models and the quantum deep learning are in the forefront of current deep learning research and applications. Born Again Neural Networks outperform their teachers significantly, both on computer vision and language modeling tasks.

However, our research on deep learning mostly focused on Compassionate Artificial Intelligence, explanation based deep learning and quantum approach to deep learning. The quantum approach to deep learning are two types: One is quantum extension of the traditional deep learning methods and another is altogether new methods suitable for quantum computing.

Future Directions for Deep Learning

Automated Machine Learning. Competing Learning Models, Hybrid Learning Models, Explainable Artificial Intelligence, and Quantum Artificial Intelligence are the trends of the future deep learning research. Incorporating emotional intelligence, spiritual intelligence, political intelligence and bodily intelligence in AI systems are part of the future deep learning research.

Incorporating general intelligence, bodily intelligence, emotional intelligence, spiritual intelligence, political intelligence and social intelligence in AI systems are part of the future deep learning research. -- Amit Ray

Today, most machine learning discovery is still based on traditional approaches, i.e. at first you put up a theory, do some practical tests and make a new theory based on the results of the experiment and so it goes on and on before getting a breakthrough. Historically, this way of doing research has worked fine, but the pace has to speed up. Deep learning will take different paths for development. Deep learning with Quantum computing technology has tremendous potential to change and build a deep compassionate human society. It will change dynamics of the society starting from child care, old age care, relationship management to military affairs, commerce, and maintaining strategic balance of power.

Summary:

We highlighted the past present and future of deep learning methods and approaches. Deep learning research is continually evolving in a complex way. Deep learning applications are growing exponentially in many industries for drug discovery, navigation, financial analysis, planning and forecasting, remote sensing, etc. We covered both deep reinforcement learning and common deep neural network learning techniques in both technical and historical perspective.

Source Books:

References:

F. Rosenblatt (1957), “The Perceptron: A Perceiving and Recognizing Automaton,” Report 85-60-1, Cornell Aeronautical Laboratory, Buffalo, New York.
A. M. Turing (1950), Computing Machinery and Intelligence. Mind 49: 433-460.
Learning representations by back-propagating errors. Rumelhart, D. E.; Hinton, G. E.; and Williams, R. J. Nature, 323: 533--536. 1986.
Arulkumaran, A Brief Survey of Deep Reinforcement Learning, 2017
Schmidhuber, J. (1990). Learning algorithms for networks with internal and external feedback. In Touretzky, D. S., Elman, J. L., Sejnowski, T. J., and Hinton, G. E., editors, Proc. of the 1990 Connectionist Models Summer School, pages 52–61. Morgan Kaufmann.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, November 1998.
Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. "Deep sparse rectifier neural networks." International Conference on Artificial Intelligence and Statistics. 2011.
Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature detectors." arXiv preprint arXiv:1207.0580 (2012).
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
Schmidhuber, Deep Learning in Neural Networks: An Overview, 2014

Dr. Amit Ray

Teachings, Books and Quotes of Sri Amit Ray