Artificial Intelligence is one of the most hyped topics of recent years. Naturally, a lot of superficial knowledge has found its way into the press and the minds of the general public. In this post, I want to uncover a few of those misconceptions.
It is not easy to navigate in the jungle of buzzwords surrounding Artificial Intelligence (AI). Besides AI, two other candidates are commonly found on startup pitch decks and newspaper articles: Machine Learning (ML) and Deep Learning (DL). Even though often used interchangeably, those three are no synonyms.
AI solves a task usually requiring human intelligence.
ML solves a specific AI-task by learning from data, making it a strict subset of AI.
DL solves an ML-problem by using neural networks (NNs) as its algorithm. Once again, we have a strict subset relationship, as can be seen in the illustration.
When we use AI but not ML, we usually have rule-based methods in place that perform tasks like automated planning or search.
When you do ML but not DL, people often refer to it as traditional ML, where you mostly rely on well-researched statistical methods.
Many people think that an AI is this magical thing that gets better and better automatically. In most of the ML systems used today, the system has been trained on historical data, i.e., it has been shown many past examples and has created a general understanding from them. When it is running in production, it uses this knowledge to make a judgment about new observations it has never seen before. And often, that is the end of it.
When you have the chance to check these judgments (e.g., predictions) against the (potentially future) reality, you get feedback about how well your model has done. Ideally, if you can collect a lot of this feedback, you can finally re-train your model to improve on its earlier mistakes. However, in most cases, this is not happening automatically and on the fly but requires an engineer and some additional tuning to be effective.
The closest we get to this myth is the concept of Reinforcement Learning, where an agent learns by interacting with an environment and observing the causes of its actions.
Often, the power of artificial NNs is justified with the statement that they work just like the human brain. Mentioning this during a conversion with someone having a background in biology will probably make him roll his eyes.
Biological NNs with neurons as the core unit are incredibly complex systems we are just starting to understand. They exist in many different physical structures, fulfilling different tasks in the human body, e.g., acting as a sensor or motor. Neurons transmit information and store its inner state via electrical pulses and its electric potential. When a certain potential threshold in the cell body is surpassed, the neuron is discharging itself via an action potential, transmitting information to adjacent neurons. This transmission can happen in a regular spiking, fast spiking, or bursting fashion.
In NNs, a neuron compares the weighted sum of its inputs (action potential of adjacent neurons) to its bias term (threshold potential). If the difference is positive, the neuron sends the information through a non-linearity or activation function (trying to mimic the action potential of a biological neuron). That is probably the farthest we get with that analogy. Besides missing some of the more complex function, NNs do not come close to the performance of the human brain. Especially when it comes to efficiency, our brains are light-years away from anything human-made.
There are a lot of highly regarded people, who cannot stop warning everyone about the AI apocalypse, the Singularity, the arrival of Superintelligence or Artificial General Intelligence(AGI). To be fair, it is hard to correctly foresee what will happen, once a machine is more intelligent than its creator for the first time. But there is still a long way to go until then.
From a feasibility point of view, AGI lies somewhere between colonizing Mars and teleporting (which is physically impossible, as far as we know today). For the former, we pretty much know what challenges we have to overcome to make it happen, e.g., build more efficient rockets and find a way to grow food sustainably. When it comes to AGI, there is no such plan. Nobody knows which additional challenges we have to overcome until we reach AGI, but we also know that it is not physically impossible. Anyway, we should not be afraid of being overthrown by the machines tomorrow.
People often say that you need to have access to large quantities of data for ML to be effective. It is true that more data (with high quality) will almost always give you better performance. And it is also true that training a deep NN from scratch (e.g., for computer vision) might require millions of images.
However, those millions of images do not all have to come from your data warehouse and be specific to your problem. Especially in computer vision the concept of transfer learning, where you start with a network pre-trained on a large dataset to solve a similar problem, works very well. Now, you can re-train only the last parts of your network (in the image referred to as NanoNet) that are very specific to your use-case. Instead of a few ten-thousand images per event, you want to detect, you only need a few hundred.
Coming back to Myth #1, we have already learned that the part of AI which is not ML is mainly based on rules. These are very much white box, since a human has designed them by hand. The statistical methods of traditional ML are very well understood and humans can perfectly interpret some easy methods like decision trees. Some methods (e.g. ensemble methods) that aggregate thousands of single models (e.g. a decision tree) can still be interpreted by their statistics.
The source of the black box myth is NNs. You can still try to gather an understanding of the intermediate representations of the network. However, their raw analysis is often not very insightful. You are able to obtain visualizations by altering an input image to fully activate a certain part of the network, but the exact inner workings leave room for interpretation. If you want to read more about the visualization of NNs, I can highly recommend this Google blog.
When you flip through the list of highly regarded publications from the recent years, a vast majority will come from the research labs of the GAFAM tech giants (Google, Apple, Facebook, Amazon, and Microsoft). This fact is guiding people to the conclusion that GAFAM completely dominate the ML space. This is both true and not true.
First, we have to understand the dynamic of open source in the ML community. It has become common practice that researchers not just publish a paper but also attach most of the code they used to get the results. This sharing-culture allows others to profit from past research and start right at the state-of-the-art with their experiments. GAFAM have followed that trend and have released a significant amount of their research. But more importantly, every single one of them has open sourced a version of their internal DL framework, which facilitates building new models by using a lot of pre-built functionality.
This whole development gave rise to a new guiding principle when thinking about ML: “It is not about the algorithms, it is about the data”. If the state-of-the-art algorithms/models are always published, whoever trains the networks with the most extensive dataset suitable to the specific task on hand, wins. Two prominent startup examples from Germany are DeepL and EyeEm, which have both trumped Google in one of their core disciplines (translation and image recognition), merely because they have acquired a higher quality data set for their specific task.
I guess everyone of us has already been victim to believing in a myth and consequently faced the embarrassing moment when we got challenged about it for the first time. With this post, I hope I could save you from some of those moments and gave you enough insights that you can now go out yourself and challenge others, who might still be subject to some of those myths. With a heated topic like AI, it is always a good idea to talk less fiction and more science for a change.
I have published a follow-up post: Mythbusters reloaded: 7 more myths about AI, taking on the next batch of common myths. I hope you enjoy the read.
Sebastian graduated top of his class with an MSc in electrical engineering from TU Munich and the CDTM. During his second MSc at Stanford he focused on management science and machine learning. He worked as a consultant at McKinsey before returning to engineering at Intel and deep tech startups.