Machine Learning - Data Science

Machine Learning Notes

Please leave a remark at the bottom of each page with your useful suggestion.

Introduction

Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.

We want to learn a target function f that maps input variables X to output variable Y, with an error e:

Y = fx + e

Linear, Nonlinear

Different algorithms make different assumptions about the shape and structure of f, thus the need of testing several methods. Any algorithm can be either:

Linear: simplify the mapping to a known linear combination form and learning its coefficients.
Nonlinear: free to learn any functional form from the training data, while maintaining some ability to generalize.

Linear algorithms are usually simpler, faster and requires less data, while Nonlinear can be are more flexible, more powerful and more performant.

Supervised, Unsupervised

Supervised learning methods learn to predict Y from X given that the data is labeled.
Unsupervised learning methods learn to find the inherent structure of the unlabeled data.

Bias-Variance trade-off

In supervised learning, the prediction error e is composed of the bias, the variance and the irreducible part.

Bias refers to simplifying assumptions made to learn the target function easily.
Variance refers to sensitivity of the model to changes in the training data.

The goal of parameterization is to achieve a low bias (underlying pattern not too simplified) and low variance (not sensitive to specificities of the training data) tradeoff.

Underfitting, Overfitting

In statistics, fit refers to how well the target function is approximated.

Underfitting refers to poor inductive learning from training data and poor generalization.
Overfitting refers to learning the training data detail and noise which leads to poor generalization. It can be limited by using resampling and defining a validation dataset.

List of Common Machine Learning Algorithms

Here is the list of commonly used machine learning algorithms. These algorithms can be applied to almost any data problem:

Linear Regression
Logistic Regression
Decision Tree
SVM
Naive Bayes
kNN
K-Means
Random Forest
Dimensionality Reduction Algorithms
Gradient Boosting algorithms
- GBM
- XGBoost
- LightGBM
- CatBoost

Three types of Machine Learning Algorithms..

1. Supervised Learning

How it works: This algorithm consist of a target / outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables). Using these set of variables, we generate a function that map inputs to desired outputs. The training process continues until the model achieves a desired level of accuracy on the training data.

Examples of Supervised Learning:

Regression
Decision Tree
Random Forest
KNN
Logistic Regression

2. Unsupervised Learning

How it works: In this algorithm, we do not have any target or outcome variable to predict / estimate. It is used for clustering population in different groups, which is widely used for segmenting customers in different groups for specific intervention.

Examples of Unsupervised Learning:

Apriori algorithm
K-means

3. Reinforcement Learning:

How it works: Using this algorithm, the machine is trained to make specific decisions. It works this way: the machine is exposed to an environment where it trains itself continually using trial and error. This machine learns from past experience and tries to capture the best possible knowledge to make accurate business decisions.

Example of Reinforcement Learning:

Markov Decision Process

2. Logistic Regression

It is a classification not a regression algorithm. It is used to estimate discrete values (Binary values like 0/1, yes/no, true/false) based on given set of independent variable(s).

Write Your Comments or Suggestion...