# Machine Learning

## Introduction

Machine learning is the trending buzz around the globe for its potentiality. We can find its applications in every sector – discrete and process manufacturing, service industry, etc. Let us discuss it with a brief example. When we visit an e-commerce site, we search some particular product for our needs and we might have noticed the recommendation section at the bottom or right of the page. It gives the similar products (based on category, price and specifications). How this recommendation section is working? It is due to machine learning algorithms. It collects data from user and gives relevant information based on input. Some popular applications are facial recognition, virtual personal assistants (like Alexa, Cortana), self-driving cars, recommendation in search engine, online customer support chat, etc.

Arthur Samuel coined the term “Machine Learning” and defined as “Field of study that gives computers the ability to learn without being explicitly programmed”.

## Types of Machine Learning

**Supervised Learning**– Here, it is mainly prediction based on user input. Data are placed on the correct label. We know the sample data relationships and based on it, we predict the outcomes. It is usually done in classification (to label the data) and regression (to get the correct outcome from labelled data).**Classification**– This method helps to classify data in different classes. It is discrete and doesn’t imply any form of order. For example, the Credit Card Company would able to provide credit based on credit score. It would be high, average and low. In machine learning, classification models such as decision tree, logistic regression, etc. are used.**Regression**– It is a commendable statistical technique used in machine learning. It helps to predict the value of future outcomes by using past data. This method helps in forecasting the data and Time-series Analysis. The different types of regression analysis are linear regression, multiple regression, polynomial regression and logistic regression.

**Unsupervised Learning**– Here, the data is neither labelled nor sorted. It mainly focuses on discovering the patterns of data. The output is dependent on coded algorithms. Supervised learning is more applicable in real life problems as unsupervised learning gives poor and inaccurate outcomes. It is further classified into clustering and association.**Clustering**– Clustering is a technique to cluster or subgroup the data into similar traits. Here data sets are treated as groups but not as individuals. For example, the Credit Card Company wanted to divide its customers according to the credit expense. Some of the clustering methods are a hierarchical, partition and so forth.

**Association**– To find the relationship between two or more data variables we use measures of association. We can solve some of the questions like – “What is the association between the climate and sales of a cloth?” “How strong is the relationship between the investment and sales of a company?” It finds out the connections among elements of the data and the sequence that led up to the correlated result.

## Statistical methods used in Machine Learning

**Sampling**– It is a process of taking a small set of observations (sample) from a large population. It is a common tool used in any type of data analysis. Some of the sampling methods are random sampling, stratified sampling and cluster sampling.**Hypothesis testing**– It is a method to analyze data for drawing inferences or make decisions about the population by using the sample data. Here, it is important to understand the sampling method. The process of selecting a sample from a population is known as sampling. For e.g. Population – a bag of rice & Sample – a handful of rice.**Correlation Analysis**– It is used to study the closeness of the relationship between two or more variables i.e. the degree to which the variables are associated with each other. Suppose in a manufacturing firm, they want the relation between –- Demand & supply of commodities.
- Production volume & the efficiency of machinery equipment.

**Regression Analysis**– It is a commendable statistical technique used in any form of data analytics. It helps to predict the value of future outcomes by using the past data. This method helps in forecasting the data and Time-series Analysis. The different types of regression analysis are- Linear Regression
- Multiple Regression
- Logistic Regression
- Poisson Regression…

**Graphical Analysis**– Here, the data are presented in the form of graphs or diagrams. When we presented data through diagrams and graphs – it looks more convincing & appealing. Thus provide the meaningful outlook of a data. Some of the popular graphical tools used are- Histogram
- Bar chart
- Pareto chart
- Scatter plot…

## What role does Statistics play in Machine Learning?

Statistics is the base of machine learning algorithms. Since we now that machine learning is a strong combination of modern computer science and statistics. There are various statistical methods which are the foundation of machine learning algorithms like random numbers, probability distributions, sampling, hypothesis testing, correlation, regression and so forth. Thus statistics provides various techniques to analyze the big data. For the applications of these techniques, one should have a basic knowledge about it – “What are these methods mean?” “Where and when we can apply?” and so forth. Hence statistics plays a pivotal role in it.