Download our e-book of Introduction To Python

Matplotlib - Subplot2grid() FunctionDiscuss Microsoft Cognitive ToolkitMatplotlib - Working with ImagesMatplotlib - PyLab moduleMatplotlib - Working With TextMatplotlib - Setting Ticks and Tick LabelsCNTK - Creating First Neural NetworkMatplotlib - MultiplotsMatplotlib - Quiver PlotPython - Chunks and Chinks View More

How can I write Python code to change a date string from "mm/dd/yy hh: mm" format to "YYYY-MM-DD HH: mm" format? Which sorting technique is used by sort() and sorted() functions of python? How to use Enum in python? Can you please help me with this error? I was just selecting some random columns from the diabetes dataset of sklearn. Decision tree is a classification algo...How can it be applied to load diabetes dataset which has DV continuous Objects in Python are mutable or immutable? How can unclassified data in a dataset be effectively managed when utilizing a decision tree-based classification model in Python? How to leave/exit/deactivate a Python virtualenvironment Join Discussion

Anmol Sharma

2 years ago

- Introduction
- What are Machine Learning Algorithms?
- Machine Learning Algorithms
- Conclusion

If you are curious about how machines learn from data and predict the optimal output, then you should definitely learn about Machine learning algorithms. Algorithms and data are the core of Machine learning. Preparing data is the first step to build a Machine learning model. Once the data is prepared you to need an algorithm to use that data for learning purposes. In this article, we will discuss the most famous Machine learning algorithms. So, let’s take a deep dive and understand these algorithms.

Machine learning Algorithms are programs that consume data and find patterns in data to predict the most accurate output for similar types of data. They can work on both labelled and unlabelled data. Selecting an algorithm completely depends upon the type of problem and the size of the data.

Machine learning algorithms types are listed below:

- Supervised Learning Algorithms
- Unsupervised Learning Algorithms
- Reinforcement Learning Algorithms

We will learn about two Machine learning algorithms types: supervised learning algorithms and unsupervised learning algorithms. Below are some of the top machine learning algorithms of these two types.

- Linear Regression
- Logistic Regression
- SVM
- KNN
- Naive Bayes
- K-means Clustering
- Apriori
- Random Forest

Linear Regression for machine learning is used for regression problems. It determines the relationship between the dependent and the interdependent variables with the help of a best-fitting line. It predicts a real number for given input variables.

It uses the following equation:

Here, W1, W2,..., Wn is the assigned weights and b is the bias.

The graph below shows the relationship between the hours studied and the percentage scored.

Advantages:

- Performs better when the relation between dependent and interdependent variables is linear.
- Less space complexity and computation

Disadvantages:

- Oversimplifies many real-world problems.
- Inefficient on non-linear data.

Logistic Regression for machine learning helps in classifying a problem. It predicts the probability of a feature belonging to a particular class. Example- Classifying email as spam or not. It uses the same equation used by linear regression but implies a sigmoid function on the output of the equation. The use of the sigmoid function is to limit the output between 0 and 1.

Take a look at the graph below to understand how Logistic regression for machine learning is different from Linear regression for machine learning.

Advantages:

- Performs well on linear and simple datasets.
- Quick to train and fast at classifying unknown records.

Disadvantages:

- Doesn’t perform well on non-linear data.
- Require a large dataset for stable results.

SVM is mostly used for classification problems. It generates hyperplanes to classify data points into different categories. The hyperplane that separates the data points of different classes most accurately is finalized and used for classification. SVM can be used to deal with both regression and classification problems.

Take a look at the picture below for a better understanding.

Advantages:

- High accuracy, can handle large data.
- Can handle non-linear data.

Disadvantages:

- It takes time; low speed.
- Sensitive to noise.

KNN is a supervised learning algorithm that is used for both regression and classification problems. It assumes that alike things exist close to each other. It searches for the K nearest neighbour of the unidentified data point in the entire dataset to find similar types of data points i.e finding patterns in the data. The unidentified data point is assigned to the class whose datapoint it is surrounded by. K is an odd number so that it can act as a tie-breaker.

The picture below shows the working of KNN.

In the above image K= 5, 2of which belongs to ClassA and 3 belongs to Class B. So, the unidentified point will be classified as a Class B element.

Advantages:

- Can handle large datasets.
- Very simple, powerful and intuitive.

Disadvantages:

- Choosing the value of K is tricky.
- Memory intensive.

Naive Bayes is also a supervised machine learning algorithm that uses the Bayes’ theorem of probability for predicting unknown classes. It assumes that every feature of a class is independent of other features even if it dependents on other features. It has three types: Gaussian NB, Multinomial NB and Beurnolli NB. This algorithm is usually used for large datasets.

Take a look at the picture below for a better understanding.

Advantages:

- Fast to train and classify.
- Non-sensitive to irrelevant features.

Disadvantages:

- Assume independence of features.

It is an unsupervised learning algorithm that forms clusters of similar data points such that the data points in a cluster are most similar and dissimilar from the data points of other clusters.

The aim of this method is to reduce the distance between the data points and centroid of the cluster.

The picture below shows how K-means work.

Advantages:

- Low complexity.
- Efficient and easy to implement.

Disadvantages:

- Selection of value of K.
- Can’t handle noise in data.

Apriori is an unsupervised learning algorithm that uses association rules to find the occurrence of a particular event in association with another event. The basic idea behind it is that all subsets of a frequent itemset must be frequent. It is commonly used for Market Basket Analysis.

Take a look at the picture below.

Look at the above picture, as Customer1 and Customer2 have bought bread and milk together there are chances that Customer3 might buy milk too. In order to find whether customer3 will buy milk too, we use Apriori.

Advantages:

- Can handle large data.
- Easy to implement.

Disadvantages:

- Require many database scans.
- Slow processing.

Random Forest is an ensemble method, which combines weak learners to form a strong learners. This al-gorithm picks random samples of data from the dataset and combines their outputs to predict a strong output. For classification problems, the output is predicted on the basis of majority voting by different random samples of data. For regression problems, it uses the mean of the outputs of the random samples of data.

The picture below describes the working of Random Forest.

Advantages:

- High accuracy.
- Can efficiently handle large datasets.

Disadvantages:

- May overfit the data.
- Less speed.

We tried to cover most of the important Machine learning algorithms. We learned how these algorithms work, the types of ml algorithms, the basic idea behind these algorithms and their advantages and disadvantages. The other common machine learning algorithms are decision trees in ml, DBSCAN, XGBoost. For a better understanding of these algorithms, we encourage you to build projects using these algorithms.

We hope you gain an understanding of what you were looking for. Do reach out to us for queries on our, AI dedicated discussion forum and get your query resolved within 30 minutes.

Enjoyed reading this blog? Then why not share it with others. Help us make this AI community stronger.

To learn more about such concepts related to Artificial Intelligence, visit our insideAIML blog page.

You can also ask direct queries related to Artificial Intelligence, Deep Learning, Data Science and Machine Learning on our live insideAIML discussion forum.

Keep Learning. Keep Growing.