Explanation and examples of 10 classic algorithms for Python machine learning

To showcase the 10 simplest examples of classic machine learning algorithms, I will write a small sample code for each algorithm. These algorithms will include linear regression, logistic regression, K-nearest neighbor (KNN), support vector machine (SVM), decision trees, random forests, naive Bayes, K-mean clustering, principal component analysis (PCA), and gradient boosting. I will implement these algorithms using common machine learning libraries like scikit-learn, numpy and pandas.

Let's get started.

1. Linear Regression

Linear regression is usually used to estimate actual values (house prices, calls, total sales, etc.) based on continuous variables. We establish the relationship between independent and dependent variables by fitting the best straight lines. This best straight line is called the regression line and is expressed by the linear equation Y= a *X + b.

The best way to understand linear regression is to look back on childhood. Suppose that if a fifth-grade child is asked to sort the students in the class in the order of light to heavy weight without asking the other person’s weight, what do you think the child will do? He (she) will likely visually measure people's height and body shape and arrange them in combination with these visible parameters. This is an example of using linear regression in real life. In fact, the child discovered that height and body shape have a certain relationship with weight, which looks very similar to the equation above.

import numpy as np
from sklearn.linear_model import LinearRegression
import  as plt

# Generate sample dataX = ([[1], [2], [3], [4], [5]])
y = ([1, 2, 3, 4, 5])

# Create a linear regression model and fit the datamodel = LinearRegression()
(X, y)

# predicty_pred = (X)

# Draw results(X, y, color='blue')
(X, y_pred, color='red')
("Linear Regression Example")
("X")
("y")
()

2. Logistic Regression

Don't be fooled by its name! This is a classification algorithm rather than a regression algorithm. The algorithm estimates discrete values based on a known range of dependent variables (say binary values 0 or 1 , yes or no, true or false). Simply put, it estimates the probability of an event occurring by fitting the data into a logical function. Therefore, it is also called logistic regression. Because it estimates probability, its output value is between 0 and 1 (as expected).

import numpy as np
from sklearn.linear_model import LogisticRegression

# Generate sample dataX = ([[1], [2], [3], [4], [5]])
y = ([0, 0, 0, 1, 1])

# Create a logistic regression model and fit the datamodel = LogisticRegression()
(X, y)

# predicty_pred = (X)
print("Predictions:", y_pred)

3. K-Nearest Neighbors (KNN)

This algorithm can be used for classification problems and regression problems. However, within the industry, K-nearest neighbor algorithms are more commonly used in classification problems. K – The nearest neighbor algorithm is a simple algorithm. It stores all cases and divides new cases by most of the k cases around. According to a distance function, the new case is assigned to the most common category of its K nearest neighbors.

These distance functions can be Euro-style distance, Manhattan distance, Ming-style distance, or Hamming distance. The first three distance functions are used for continuous functions, and the fourth function (Haming function) is used for categorical variables. If K=1, the new case is directly classified into the category to which the case closest to it belongs. Sometimes, when using KNN to model, selecting the value of K is a challenge.

import numpy as np
from  import KNeighborsClassifier

# Generate sample dataX = ([[1], [2], [3], [4], [5]])
y = ([0, 0, 0, 1, 1])

# Create a KNN model and fit the datamodel = KNeighborsClassifier(n_neighbors=3)
(X, y)

# predicty_pred = (X)
print("Predictions:", y_pred)

4. Support Vector Machine (SVM)

This is a classification method. In this algorithm, we mark each data with points in the N-dimensional space (N is the total number of features you have), and the value of each feature is a coordinate value.

import numpy as np
from  import SVC

# Generate sample dataX = ([[1], [2], [3], [4], [5]])
y = ([0, 0, 0, 1, 1])

# Create an SVM model and fit the datamodel = SVC()
(X, y)

# predicty_pred = (X)
print("Predictions:", y_pred)

5. Decision Tree

This is one of my favorite and most frequently used algorithms. This supervised learning algorithm is often used in classification problems. Surprisingly, it works for both categorical and continuous dependent variables. In this algorithm, we divide the population into two or more groups of people. This is divided into as different groups as possible based on the most important attributes or independent variables.

import numpy as np
from  import DecisionTreeClassifier

# Generate sample dataX = ([[1], [2], [3], [4], [5]])
y = ([0, 0, 0, 1, 1])

# Create a decision tree model and fit the datamodel = DecisionTreeClassifier()
(X, y)

# predicty_pred = (X)
print("Predictions:", y_pred)

6. Random Forest

Random forest is a proper noun that represents the overall decision tree. In the random forest algorithm, we have a series of decision trees (hence the "forest"). In order to classify a new object based on its properties, each decision tree has a classification called this decision tree "vote" for that category. This forest chooses to get the category with the highest votes in the forest (of all trees).

import numpy as np
from  import RandomForestClassifier

# Generate sample dataX = ([[1], [2], [3], [4], [5]])
y = ([0, 0, 0, 1, 1])

# Create a random forest model and fit the datamodel = RandomForestClassifier(n_estimators=10)
(X, y)

# predicty_pred = (X)
print("Predictions:", y_pred)

7. Naive Bayes

On the premise that predicts that the variables are independent of each other, the Naive Bayes classification method can be obtained according to Bayes theorem. In simpler terms, a naive Bayes classifier assumes that the properties of a class are not related to other properties of that class. For example, if a fruit is round and red，And the diameter is about 3 inches, then this fruit may be an apple. Even if these characteristics depend on each other，Or relying on the existence of other characteristics, the Naive Bayes classifier will assume that these characteristics independently imply that the fruit is an apple.

Naive Bayes models are easy to build and are very useful for large data sets. Although simple, Naive Bayes' performance goes beyond very complex classification methods.

import numpy as np
from sklearn.naive_bayes import GaussianNB

# Generate sample dataX = ([[1], [2], [3], [4], [5]])
y = ([0, 0, 0, 1, 1])

# Create a naive Bayesian model and fit the datamodel = GaussianNB()
(X, y)

# predicty_pred = (X)
print("Predictions:", y_pred)

8. K-Means Clustering

K – The mean algorithm is an unsupervised learning algorithm that can solve clustering problems. The process of using the K-mean algorithm to classify a data into a certain number of clusters (assuming there are k clusters) is simple. Data points within a cluster are uniform and homogeneous, and different from other clusters.

import numpy as np
from  import KMeans
import  as plt

# Generate sample dataX = ([[1], [2], [3], [4], [5]])

# Create a K-Means model and fit the datamodel = KMeans(n_clusters=2)
(X)

# predicty_pred = (X)

# Draw results(X, np.zeros_like(X), c=y_pred, cmap='viridis')
("K-Means Clustering Example")
("X")
()

9. Principal Component Analysis (PCA)

Principal Component Analysis (PCA, Principal Component Analysis) is a commonly used data dimensionality reduction technique that aims to convert a set of possible correlation variables into a set of linearly uncorrelated variables called principal components while retaining as much information from the original dataset as possible.

import numpy as np
from  import PCA
import  as plt

# Generate sample dataX = ([[1, 2], [3, 4], [5, 6], [7, 8]])

# Create a PCA model and fit the datapca = PCA(n_components=2)
X_r = pca.fit_transform(X)

# Draw results(X_r[:, 0], X_r[:, 1])
("PCA Example")
("Principal Component 1")
("Principal Component 2")
()

10. Gradient Boosting

Gradient lifting models are an integrated learning method that iteratively trains multiple weak learners (usually decision trees) and combines them into a strong learner. Gradient Boosting models can help us understand the relative importance of each feature in our data by analyzing the importance of features.

import numpy as np
from  import GradientBoostingClassifier

# Generate sample dataX = ([[1], [2], [3], [4], [5]])
y = ([0, 0, 0, 1, 1])

# Create a gradient boost model and fit the datamodel = GradientBoostingClassifier(n_estimators=10)
(X, y)

# predicty_pred = (X)
print("Predictions:", y_pred)

Summarize

This is the end of this article about the explanations and examples of the 10 classic algorithms of Python machine learning. For more related contents of the 10 classic algorithms of Python machine learning, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!