Where Does the Machine Learning and AI Megatrend Come From?

Nowadays, the ideas of machine learning (ML), neural networks, and artificial intelligence (AI) are trending topics seeming to be the focus of discussion everywhere. In this article, we briefly summarize the development of Machine Learning in the last ten years and explain why this trend will be applied more in all economic sectors. 

Trend increase of the term Machine Learning
Machine Learning - Where does the hype come from?


In the 1940s, Warren McCulloch and Walter Pitts laid the foundations of machine learning with their publication “A Logical Calculus of the Ideas Immanent in Nervous Activity“ on the topics of neurons and nerve networks.

In 1957 Frank Rosenblatt developed the Perceptron algorithm, which represents a simplified model of a biological neuron. Three years later, Bernard Widrow and Marcian Hoff developed ADALINE, an early artificial neural network and for the first time, the weights of the inputs could be learned by the network.

However, the publication of the book "Perceptrons" by Marvin Minsky and Seymour Papert in 1969 meant that after the initial euphoria about machine learning, the topic lost its importance and we fell into the so-called "AI winter". The book presents not only the strengths but also the serious limitations of perceptrons such as the XOR problem. The XOR problem represented such a hurdle because classical perceptrons can only solve linearly separable functions. However, the XOR function generates a non-linear system that can not be solved in a linear manner.


New Revivals

David Rumelhart, Geoff Hinton, and Ronald Wiliams laid the foundation for deep learning through backpropagation experiments in 1986 and they solved the XOR problem by applying the method of backpropagation to multi-layer neural networks.

Another big step in machine learning was the use of deep learning. Deep learning refers to a class of machine learning algorithms that can solve nonlinear problems due to their high number of layers. Each layer processes the data transferred from the layer above thus abstracting the data, layer by layer.


Machine Learning Today 


The Influence of AlexNet on Machine Learning



From the AlexNet Paper by Hinton, Krizhevsky, Sutskever of 2012. Neural Networks and Machine Learning
From the AlexNet paper by Hinton, Krizhevsky, Sutskever from 2012, the source is linked below.

In the last decade, the topic gained popularity again especially in 2012, Geoff Hinton, Alex Krizhevsky and Ilya Sutskever caused quite a stir with their Convolutional Neural Network AlexNet.


Success in the Large Scale Visual Recognition Challenge

With AlexNet, they were able to achieve an outstanding result by using deep learning methods at the annual ImageNet Large Scale Visual Recognition Challenge ( ILSVRC ), which has been held annually since 2010. The aim is to design the most efficient image recognition software possible by using the free ImageNet database. In the first year, the best result was an error rate of 28.2%. By the second year, the error rate was still 25.7% and the 2nd best result from 2012 still had an error rate of 26.2%.  The AlexNet team, in contrast, achieved an error rate of just 16.4%. This result quickly made a big impact in the professional world, which rekindled the hype about and the importance of machine learning.


Reasons for the Success of AlexNet

On the one hand, this result can be attributed to advances in the theory of machine learning algorithms. For example, the use of the so-called "rectified linear activation unit" (ReLU) has greatly increased the efficiency and speed of deep learning algorithms.  Among other problems, the use of ReLU has since solved the Vanishing Gradient Problem; where certain parts of a network may no longer be active during the training of the neural net and in worst-case scenarios, means that this network can no longer be trained.

Unlike previous competitors, Hinton used graphics cards instead of CPUs thanks to the CUDA technology released by Nvidia in 2007. This technology allowed for graphics cards to be used for general calculations. In a 2006 study, Rajat Raina, Anand Madhavan, and Andrew Ng showed that the use of graphics cards instead of CPU's could increase the speed of neural network training by up to 15 times.


Development After to AlexNet

After the success of AlexNet, the potential behind these methods were increasingly recognized, which is why even big companies like Google started to engage with machine learning. As an example, machine learning algorithms can be used to develop self-driving cars (eg Waymo), because of their ability to solve non-linear problems. From this trend, various program libraries such as Google's TensorFlow, Keras, or Theano, developed by the University of Montreal, emerged.


Why is it applicable today?

Machine learning methods are recently finding great applicability because of the tools above and the more widely available computing power. The prices for graphics cards have fallen in relation to computing power in recent years, as the following illustrations show.


Graphics CardGFLOPSPrice ($)Publication YearGFLOPS/€
Nvidia GeForce GTX 6803.09050020126,2
Nvidia GeForce GTX 7803.97749920136,1
Nvidia GeForce GTX 780 Ti5.04669920137,2
Nvidia GeForce GTX 9804.61254920148,4
Nvidia GeForce GTX 980 Ti5.63264920158,7
Nvidia GeForce GTX 10808.228499201716,5
Nvidia GeForce GTX 1080 Ti10.609699201715,2
Nvidia GeForce RTX 20808.920699201812,8
Nvidia GeForce RTX 2080 Ti11.750999201811,8


Development of the most powerful graphics cards for machine learning applications

Development of the most Powerful Graphics Cards for Machine Learning Applications

Google's 2016 Tensor Processing Units (TPU) enabled the acceleration of machine learning applications and also allowed accelerated training of neural networks in later generations from the years 2017 and 2018. Also helpful in the application of neural networks is the ability to rely on GPU clusters, because they allow fast training of the networks.  Today, it is not even necessary to perform the calculations on your own computer, instead, it is possible to perform the calculations at very reasonable prices in the cloud ( ImageNet Benchmark ).



Areas of Applications 

Computer vision is one of the most important areas of application for machine learning algorithms. Computer vision is a term used to describe when one enables a computer to gain a general understanding of images or videos to obtain information from them. Another area of application is speech analysis and the evaluation of texts. Speech analysis teaches the computer to understand general spoken words and, for example, convert them into a written text. In text analysis, the computer is supposed to be able to extract information from any text.

All of these areas result in exciting use cases such as the evaluation of satellite data, the enhancement of image searches, the analysis of public sentiment, or self-driving cars.


Do only International IT Companies Benefit from this Development?


Applicability of neural networks in practical application 1957-2012
The evolution of the past decade has made neural networks practically and widely applicable.


The affordable availability of computing power, open-source tools, and the availability of data through digital processes today allows almost all companies to be able to use machine learning methods. Companies that benefit from this development often start with small projects that help them better understand the technology, the way they handle data, and the changes needed in their own processes.

Use cases where good results can be achieved quickly include:

  • Automatic evaluation of images or video recordings
  • Predicting key figures (demand, inventory levels, etc.) allow quicker and better decisions can be made
  • Knowledge extraction from documents and large text bodies
  • Automatic classification of frequently occurring business transactions (for example, in banking, insurance, or other audit cases) into automatically acceptable requests and those that still require manual post-processing.

Title Photo by Pixabay  from Pexels

What is Machine Learning?

Machine Learning enables computers to learn knowledge from data without someone or something explicitly programming it. This knowledge is a function that assigns a suitable output to an input. An algorithm adjusts the function until it achieves the desired results. In recent years, a number of training methods have been established, which only lead to good results with the availability of very large data sets. The phrase "data is the new oil" often refers to the fact that companies with richer data sets can train more powerful models.


Narrow AI vs. General AI

Our current machine learning algorithms are generally only applicable to very specific problems; a so-called, Narrow AI.  General AI is an algorithm that learns an abstract world view model. Like a human being, this would be able to combine knowledge from different fields and transfer it to previously unknown problems. However, we are still a long way from such a General AI. It is not even clear whether such a General AI is possible in principle. Nevertheless: Narrow AI today is already able to solve problems that were previously difficult or impossible to solve with computers more efficiently than humans.


Vergleich Narrow AI und General AI
Comparison of Narrow AI and General AI

Supervised and Unsupervised Learning

There are two approaches to train Machine Learning models; supervised and unsupervised learning.  In Supervised Learning, the input and the desired output are known at training time. From this information, the algorithm generates a model that describes the relationship between input and output. After the training, this model can provide general results for an input, i.e. results that are not limited to the training data set. For example, Supervised Learning Training is used to recognize image content. During training, images are provided with a list of image content for each image as input. The training should enable the model to recognize the correct objects as output.

In Unsupervised Learning, however, the program does not receive any information about the desired output. The program then has to create a model that generates suitable outputs for given inputs on the basis of their similarities. It is difficult to judge the result of such a model qualitatively because there is no specification for the results. The automatic recognition of clusters in quantitative data is a typical problem, however, unsupervised learning models achieve good results. Among other things, a program can automatically recognize outliers and new patterns in data.  

Problems Machine Learning Can Solve

As mentioned earlier, machine learning algorithms are currently being applied to very specific problems. Three abstract problem types can be solved:


The Classification Problem

The classification problem is the automated grouping of objects into classes. A classic example of this is a program that is able to recognize whether a cat or a dog is on a picture. For classification problems, supervised learning models are often used. On the basis of suitable examples, the model independently learns to assign objects to the correct class. Which properties are considered is either given to the model depending on the method (Feature Engineering) or also learned automatically.


The Regression Problem

The Regression Problem is about estimating the future course of a function. A classic example would be the prediction of the water level of a river. Here, conclusions are drawn from previous years to predict the future situation. A supervised learning model is often used for this purpose. This model can be trained with historical data that allow conclusions to be drawn about future data.


The Clustering Problem

The goal of the clustering problem is to design an algorithm that independently categorizes given data into groups of similar objects; often an unsupervised learning model is used. Such a model does not need more precise specifications as to how it should categorize. This ensures that certain groups of data, which the developer may not even perceive as such, are not excluded from the outset. A classic example would be target group analysis in marketing, such as personal advertising. Customers are divided into different groups in order to display specific advertising.


Machine Learning Algorithmen nach Problemstellungen
Machine learning algorithms for problems


Challenges when using Machine Learning Methods

However, the use of machine learning methods poses a number of challenges despite the many advantages:


Data Quality

In practice, much of the work of machine learning projects consists of obtaining, understanding, and preparing the right data. This step, generally called "preprocessing", ensures that the data represents exactly what the model is actually trained on. Particularly in larger organizations, the collection and aggregation of data is not a trivial task where several departments have to cooperate with each other. If the necessary data is not available for the desired scope or quality, the first project phase often consists of collecting and processing this data.

Errors and distortions in the data can lead to the trained model also learning these, one speaks here of "biases", i.e. prejudices that emerge from the data.


Differences Between Different Machine Learning Methods in terms of Performance and Explainability

The different machine learning methods differ in terms of their performance and their respective explanations. In general, if a machine learning model achieves high performance, its explainability decreases, as well as vice versa.

For example, neural networks perform very well but trying to understand the solution reveals that it does not have much to do with what we would understand as logical problem-solving. This is because the models consist of hundreds or even more independent variables and many more calculations. This is called the black box problem. This characteristic becomes problematic whenever this information is important, e.g. in vital decisions such as in medicine, when a machine learning model suggests a treatment.

In contrast, there are procedures such as linear regression or decision trees that are very evident. As a result, the models are not well suited for complex problems.


Machine Learning Algorithmen nach Performance und Erklärbarkeit
Machine Learning Algorithms by Performance and Explainability



Overfitting occurs when a machine learning model is no longer able to achieve general results after training. This happens when you train a machine learning model too often on a training set that is too small. The error rate is getting smaller, but the performance is not necessarily better.  Because instead of making the model a generic function, it seems more like it has memorized the data. Finally, correct results are only achieved on the training set.  

Photo by Franck V. from Unsplash