The Bayesian classifier is reduced to a linear classifier by constraining the classes to have equal covariances.
Comparing the performance of generative and discriminative models in challenging classification tasks.
In this lab class:We will compare the "generative modeling" and "discriminant modeling" approaches to linear classification. For the "generative" approach, we will revisit the Bayesian classification code we used in the previous exercise, but we will restrict the system to have equal covariance matrices, i.e., one covariance matrix to represent all categories, rather than each category having its own covariance matrix. In this case, the system becomes a linear classifier. We will compare this with the "discriminative" approach, where we use a perceptron learning algorithm to learn the linear classifier parameters directly.
In this notebook, we will use another dataset from the UCI Machine Learning Library: the abalone data. An abalone is a sea snail. The age of a specimen can be determined by cutting the shell on a cone and counting the rings with a microscope (more like a tree), but this is a time-consuming and expensive process. The task here is to try and predict the number of rings by simply externally measuring the weight and size of the animal. For the dataset we are using, the true value of the number of rings is known (i.e., the rings were counted after measuring the snails). The results ranged from 1 to 29 rings, so this is usually considered a 29-class classification problem. To simplify things a bit, I regrouped the data into two classes of roughly the same size: young (less than 10 rings) and old (10 or more rings). I also took only female samples. There were 7 measures (all highly correlated) used to predict category labels.
Generative modeling:Bayesian classification with isocovariate multivariate normal distributions.
There are more samples (1306 compared to 178) than in the previous presentation, so we don't have to worry about missing a test; instead, we simply cut the data into equally sized test and training sets as in the previous one.
The performance of the Bayesian classifier was evaluated by modifying the last code written to use a multivariate normal distribution with a full covariance matrix. When considering changes to the code, note that the main difference is that there are only two classes in this notebook, not three. (If you wish, you can try wrapping the code into a function to see if it can be designed to work for any number of classes.)
What is the performance of your classifier? The score for this task may be between 60% and 70%, so don't worry if the performance seems much worse than the previous task. If the performance is below 60% then you should check the code for possible bugs.
import numpy as np X = (open("data/", "r")) from import multivariate_normal import as plt %matplotlib inline abalone1 = X[X[:, 0] == 1, :] abalone2 = X[X[:, 0] == 2, :] abalone1_test = abalone1[0::2, :] abalone1_train = abalone1[1::2, :] abalone2_test = abalone2[0::2, :] abalone2_train = abalone2[1::2, :] abalone_test = ((abalone1_test, abalone2_test)) abalone_test.shape mean1 = (abalone1_train[:, 1:], axis=0) mean2 = (abalone2_train[:, 1:], axis=0) cov1 = (abalone1_train[:, 1:], rowvar=0) cov2 = (abalone2_train[:, 1:], rowvar=0) dist1 = multivariate_normal(mean=mean1, cov=cov1) dist2 = multivariate_normal(mean=mean2, cov=cov2) p1 = (abalone_test[:, 1:]) p2 = (abalone_test[:, 1:]) p = ((p1, p2)) index = (p, axis=0) + 1 (index, "k.", ms=10) correct = abalone_test[:, 0] == index percent_correct = (correct) * 100.0 / print(percent_correct)
rowvarbool, optional
If rowvar is True (the default), then each row represents a variable and the columns contain observations. Otherwise, the relationship is transformed: each column represents a variable and the rows contain observations.
Using the isocovariance matrix.
If you have correctly followed the same steps as in the previous note, you will have estimated a separate covariance matrix for each class. These matrices will not be equal, so your system will not be a linear classifier (i.e., it will have non-planar decision boundaries). To reduce it to a linear system, we need to make sure that there is only one covariance matrix. You can imagine doing this without
The same way:
First, you could imagine simply estimating a single covariance matrix from the complete training set and then dividing it into classes. This would generate a matrix, but it's not the right thing to do. We want the matrix to represent the distribution within classes, and if you train the model using only the full training dataset, it will also capture the distribution between classes.
Second, one can imagine averaging the two class correlation covariance matrices. This is closer to the correct case, but it does not take into account the fact that the number of examples of classes may not be equal.
The best way to do this is to first move the centers of the two classes to the same point and then treat them as a single class. To move the class centers to the same point, simply subtract the class average vector from each data sample.
def centre_data(data): nsamples = [0] data_mean = (data, axis=0) data_centred = data - data_mean return data_centred abalone1_centred = centre_data(abalone1_train) abalone2_centred = centre_data(abalone2_train) abalone_centred = ((abalone1_centred, abalone2_centred)) cov_global = (abalone_centred[:, 1:], rowvar=0) dist1 = multivariate_normal(mean=mean1, cov=cov_global) dist2 = multivariate_normal(mean=mean2, cov=cov_global) p1 = (abalone_test[:, 1:]) p2 = (abalone_test[:, 1:]) p = ((p1, p2)) index = (p, axis=0) + 1 (index, "k.", ms=10) correct = abalone_test[:, 0] == index percent_correct = (correct) * 100.0 / print(percent_correct)
This article on the introduction of Python linear categorization is introduced to this article, more related Python linear categorization content, please search for my previous articles or continue to browse the following related articles I hope you will support me in the future!