Comprehensive Machine Learning Tutorial: A Beginner’s Guide
Table of Content:
Welcome to Machine Learning Fundamentals!
In this course, you will be exposed to the different concepts of Machine learning with a brief overview on each. We will be adding detailed courses covering each of these concepts in-depth.
Difference between Supervised vs. Unsupervised
Take the example of face recognition.
In Supervised learning
, one will learn from many examples as to what & how a face is, in terms of structure, color, shape, position of eyes, nose and so on. After several iterations, the algorithm learns to define a face.
In Unsupervised learning
, there is no desired output provided. Therefore, categorization is done so that the algorithm differentiates correctly between the face of a horse, cow or human (clustering of data).
Machine Learning Techniques
Now that you have fair understanding of Supervised & Unsupervised learning
and Features & Labels
, let's now focus on learning different techniques used for Machine Learning.
Decision tree learning
is commonly used in data mining. A decision tree
is a tree-like model of decisions and possible consequences, chance event outcomes, resource costs, and utility. It is a way to display an algorithm.
Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression.
Decision Tree - Types
There are 2 types of Decision trees:
-
Classification Tree
- The predicted outcome is the class to which the data belongs. This corresponds to the Tree models where the target variable can take a finite set of values. -
Regression Tree
- The predicted outcome can be considered a real number. This corresponds to the Tree models where the target variable can take continuous values.
(We will discuss these in detail in separate course).
Decision Tree - Pros & Cons
Pros:
- Easy and simple to understand & interpret.
- Can analyze both numerical and categorical data.
Cons:
- Small variations in the data might generate a completely different tree.
Naïve Bayes
Naive Bayes, a supervised learning methodology, is a family of algorithms based on a common principle:
All Naive Bayes classifiers assume that the value of a particular feature is independent of the value of any other feature, given the class variable
Naïve Bayes Usage
- Naive Bayes comes handy since you can train it quickly.
- You can use it when you have limited resources in terms of CPU and Memory.
- It is usually used for Real-time predictions, Multi-class Predictions, Text classification, Spam filtering, and Sentiment Analysis.
Gradient Descent
- Gradient descent is an optimization algorithm. It is normally used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).
- In Gradient descent, the algorithm has to run through ALL the samples in given training set to update a parameter in a particular iteration.
- Hence, if the number of training samples is large, or in fact very large, then using gradient descent may be time-consuming. This is when you update the values of the parameters in every iteration, you are running through the complete training set.
Linear Regression
- Linear regression (considered as a step up after correlation) predicts the value of a dependent variable depending on the value of an independent variable.
- Simple linear regression has only 1 independent variable whereas multiple linear regression has (>1) independent variables.
- It is very sensitive to Outliers. Outliers could terribly affect the regression line and eventually the forecasted values. Hence it is a good practice to keep a check on the Outliers.
Logistic Regression
Logistic regression or logit model is used to model dichotomous outcome variables.
This is used with data where there is a binary (success-failure) outcome variable. It is also used when the outcome takes the form of a binomial proportion.
Support Vector Machine
Support Vector Machine (SVM) is a supervised machine learning algorithm. It is used for classification or regression type of problems.
-
SVM is all about identifying the right hyper plane. To decide the right hyper-plane, we need to maximize the distances between the nearest data point (either class) and hyper-plane.
-
SVM works well with clear margin of separation & high dimensional spaces.
Kernel Methods
- Kernel methods provide ways to manipulate data as though it were projected into a higher dimensional space, by operating on it in its original space.
- The number of operations required is not necessarily proportional to the number of features.
Neural Networks
Neural network is a powerful computational data model, that captures and represents complex input/output relationships.
This model is motivated by the desire to develop an artificial system which would be able to perform "intelligent tasks" similar to the human brain
.
Neural networks try to resemble the human brain in the following two ways:
- Acquires knowledge through learning.
- Stores knowledge within inter-neuron connection strengths known as synaptic weights.
Neural Networks - Application
Neural Networks have a broad spectrum of data-intensive applications such as,
- Process Modeling and Control
- Machine Diagnostics
- Target Recognition
- Medical Diagnosis
- Credit Rating
- Financial Forecasting and so on.
Clustering
Clustering is an unsupervised learning model
that deals with finding a structure/cluster in a collection of unlabeled data.
The idea is to partition the examples into clusters
or classes
. Each class predicts feature values for the examples in the class.
The are 2 main types of clustering :
- K-Means Clustering
- Hierarchical Clustering
We will discuss about the clustering types in the next set of cards.
K-means Clustering
In Clustering, you partition a group of data points into a small number of clusters.
This tries to improve the inter group similarity while keeping the groups as far as possible from each other.
How long to do iteration in K-means
Iterate until stable (= no object move group):
- Determine the centroid coordinate.
- Determine the distance of each object to the centroids.
- Group the object based on minimum distance (find the closest centroid).
Hierarchical Clustering
Hierarchical clustering builds hierarchy of clusters.
We do not partition the data into a particular cluster in a single step. Instead, there are a series of partitions which may contain either a single cluster with all the objects or n clusters, each containing a single object.
Hierarchical Clustering
The Hierarchical clustering can be implemented in 2 ways:
Top-Down Approach In this approach, all data points are assigned to a single cluster, and the clusters recursively perform splits till each data point is assigned a separate cluster.
Bottom-Top Approach In this approach, all data points assigned are slowly merged recursively until it forms a single large cluster.