Note: This article was first published on Towards Data Science Medium.
In the column “Structuring Machine Learning Concepts”, I am trying to take concepts from the Machine Learning (ML) space and cast them into new, potentially unusual frameworks to provide novel perspectives. The content is meant for people in the data science community, as well as tech-savvy individuals who are interested in the field of ML.
Back in 2015 when I started picking up ML at Stanford, the concepts and definitions around it were fairly structured and easy to map out. With the rapid growth of Deep Learning in recent years, the variety of terms and concepts used has increased immensely. This can leave newcomers to the field, who wish to learn more about the subject, frustrated and confused.
The trigger for writing this installment of “Structuring Machine Learning Concepts” was the concept confusion that recent breakthroughs in Natural Language Processing (NLP) and Computer Vision have brought to the table. People are starting to realize that some techniques that were previously regarded as Unsupervised Learning should more aptly be named Self-Supervised Learning. Let’s expand on that.
When people talk about the different forms of Machine Learning, they usually refer to Supervised Learning (SL), Unsupervised Learning (UnSL), and Reinforcement Learning (RL) as the three learning styles. Sometimes, we add Semi-Supervised Learning (SemiSL) to the mix, combining elements of SL and UnSL. In 2018, a new breed of NLP algorithms started to gain popularity, leading to the famous researcher Yann LeCun to coin the concept of Self-Supervised Learning (SelfSL) in 2019.
There are two things worth mentioning about these four learning styles:
As already hinted at, most of the UnSL being done in Computer Vision and NLP recently is better described as SelfSL. This new learning style is not supervised using a given ground-truth, but using information contained in the training data itself. However, there are still parts of the “old family” of UnSL algorithms, which are truly unsupervised, that use some metric of closeness or proximity between data points to decide what is a good fit (and guide our loss function).
Also, if you think about it, SemiSL should not be a part of these “pure” learning styles. One, it is rather a mix of two “pure” learning styles and two, its basic setup involves having two different datasets, one labeled and one unlabeled. Therefore, we will save SemiSL for the next post of the “Structuring Machine Learning Concepts” series, where we will talk in more detail about processing unlabeled data.
Iam proposing a simple 2x2 matrix, which maps SL, UnSL, SelfSL, & RL onto two axes, answering the following questions:
For SL and SelfSL, there is a ground-truth we are using to build our loss functions and metrics. Be it the “cat” label on an image for SL, driving the “categorical cross-entropy loss” and the “accuracy”, or the “hidden” word in a sentence (Marry [loves] her husband) for SelfSL, where we use “negative log-likelihood” as a loss and measure “perplexity”.
For UnSL and RL, there is no such ground-truth. We have measures that describe “good fit” or “desired behavior”, but nothing similar to “accuracy”. In “k-means clustering” of UnSL for example, we can measure the “average distance to the cluster mean”, and in RL, we are trying to maximize the “cumulative reward” we are receiving from the environment.
In SL and RL, we have an explicit choice of what we want to get out of the data or our agent. For SL, it is our choice to turn a “cats and dogs breeds classification” problem into a simple “cats and dogs classification” problem, by re-assigning the labels. When using RL for mastering multi-player computer games, we can choose to incentivize our agent to act as a team player by rewarding the actions taken for the benefit of the group or to act as an egoist by solely rewarding individual actions.
However, we cannot extrinsically dictate the nature of an image or language in SelfSL. We can surely change some details, but the “reconstruction loss” will always compare two images, and for language models, we will always come up with learning tasks looking at the sentences themselves. With classical UnSL, we are implicitly stuck with finding data points that are close to each other, e.g., two users leaving behind similar behavior data on social media platforms.
Inthis post, we have redefined the “pure” learning styles in ML by separating UnSL and SelfSL and leaving SemiSL out of the equation. This brings us to the four concepts of SL, UnSL, SelfSL, and RL, which we can arrange in a simple framework (full disclosure: I did work in consulting for a while). The 2x2 matrix structures them according to whether a ground-truth exists and whether the objective is explicitly or implicitly given.
Stay tuned for the next articles.
Sebastian graduated top of his class with an MSc in electrical engineering from TU Munich and the CDTM. During his second MSc at Stanford he focused on management science and machine learning. He worked as a consultant at McKinsey before returning to engineering at Intel and deep tech startups.