Header Ads Widget

Responsive Advertisement

What is Machine Learning || How does Machine Learning work?




Have you heard about this concept called "machine learning", and you're trying to figure out exactly what that means? Or maybe you've checked out a few machine learning competitions but you don't know how to get started? If so, I'm here to help.You will need to have at least minimal experience with the Python programming language, but I'll suggest some resources in the next article if you don't yet know Python. So with that, let's get started! In this article, I'll be covering the following topics: 

What is machine learning? What are the two main categories of machine learning?What are some examples of machine learning? And, how does machine learning "work"? So, what exactly is machine learning? 

There's no universal definition, but at a high level, I would define machine learning as the semi-automated extraction of knowledge from data. Let's break that down into three component parts: 

First, machine learning always starts with data, and your goal is to extract knowledge or insight from that data. You have a question you're trying to answer, and you hypothesize that your question might be answerable using the data. 

Second, machine learning involves some amount of automation. Rather than trying to gather your insights from the data manually, you are applying some process or algorithm to the data using a computer so that the computer can help to provide the insight. 

 Third, machine learning is not a fully automated process. As any practitioner can tell you, machine learning requires you to make many smart decisions in order for the process to be successful. We'll cover many of those decisions through out this article.

 Next, let's talk about the two main categories of machine learning, which are supervised learning and unsupervised learning. 



Supervised learning, also known as predictive modeling, is the process of making predictions using data. For example, if my data set is a series of email messages, my supervised learning task might be to predict whether each email message is spam or non-spam, which is also known as "ham". This is supervised learning because there is a specific outcome we are trying to predict, namely ham or spam. In contrast, 



Unsupervised learning is the process of extracting structure from data or learning how to best represent data. For example, if my data set was the characteristics and purchasing behavior of shoppers at a grocery store, my unsupervised learning task might be to segment the shoppers into groups or"clusters" that exhibit similar behaviors. I might find that college students, parents with young children, and older adults have characteristic shopping behaviors that are similar within each group but dissimilar from the other two groups. This is an unsupervised learning task because there is no right or wrong answer about how many clusters can be found in the data, which people belong in which cluster, or even how to describe each cluster. Let's do a quick quiz. This is Tech website,which is a popular platform for machine learning competitions. This is their well-known Titanic competition, and the goal is to predict which passengers survived the tragic sinking of the Titanic. Is this supervised or unsupervised learning? This is supervised learning, because your goal is to predict a specific outcome (namely survival) for each passenger. In this article, I'm going to primarily focus on supervised learning, though I may cover unsupervised learning in later article. 

We've talked about what supervised learning is, but we haven't yet talked about how it works. So, how does it actually work? At very high level, here are the two main steps of supervised learning:  

First, you train a machine learning model using your existing labeled data. Labeled data is data which has been labeled with the outcome, which in the case of the email example, is whether each message is ham or spam. This is called "model training" because the model is learning the relationship between the attributes of the data and the outcome. These attributes might include the message text, the number of embedded links, the length of the message, and so on. 

Second, you make predictions on new data for which you don't know the true outcome. In other words, when a new email message arrives,you want your trained model to accurately predict whether the email is ham or spam without a human examining it. To summarize these two steps, you could say that the model is learning from past examples, made up of inputs and outputs, and then applying what it has learned to future inputs in order to predict future outputs. Because you are making predictions on unseen data, which is data that was not used to train the model, it is often said that the primary goal of supervised learning is to build models that generalize. In other words, you want to build machine learning models that accurately predict the labels of your future emails,rather than accurately predicting the labels of emails you have already received. 

This simplified description of machine learning might raise some questions in your mind, such as: How do I choose which attributes of my data to include in the model? How do I choose which model to use? How do I optimize this model for best performance? How do I ensure that I'm building a model that will generalize to unseen data? Can I estimate how well my model is likely to perform on unseen data? These are excellent questions, and hint at the complexity of doing effective machine learning! All of these issues will be addressed later in the next article. If you'd like a more in-depth introduction to machine learning, there are two resources that I recommend that. The first resource is my favorite book on machine learning, "An Introduction to Statistical Learning" by Trevor Hastie and Rob Tibshirani. 

which uses some excellent examples to compare supervised and unsupervised learning, and also introduces another type of machine learning called reinforcement learning. In the next article in this series, I'll be covering the benefits and drawbacks of sci kit-learn, as well as my recommended way to set up Python for machine learning. 

In the meantime, I'd love to hear from you in the blogger comments if you have a question about machine learning, or if you just have a cool example of machine learning that you'd like to share. Please do comment on blogger if you'd like to hear the moment my next article comes out.

 

Thanks for reading,and I'll see you soon.


Written By Vishnu

Post a Comment

0 Comments