SoFunction
Updated on 2025-04-10

How to use Fisher discrimination in R language

Recently, when I wrote the relevant code that Fisher discriminated, I needed to compare the results with existing software to determine the correctness of my own code, so I found R, which is easy to install and free. Here we record the method of Fisher discrimination in R.

1. Discriminant analysis and Fisher discrimination

In a less rigorous but commonly speaking, discriminant analysis is a multivariate (multiple variable) statistical analysis method, which classifies samples based on the values ​​of multiple known variables of the sample. Generally speaking, discriminant analysis consists of two stages - learning (training) and discrimination. In the learning stage, given a batch of samples that have been classified, learn (training) based on their classification situation and the values ​​of multiple variables of the sample to obtain a discrimination method; in the discrimination stage, use the discrimination method obtained in the previous stage to distinguish other samples.

Fisher Discrimination Method, also known as Linear Discriminative Analysis, is a type of discriminative analysis, which can be traced back to 1936. Its core idea is to project multidimensional data (multiple variables) (using linear operations) onto one-dimensional (single variables), and then classify the samples by a given threshold based on the single variable after projection.

The learning (training) stage of Fisher's discrimination is to find a suitable projection method, so that for samples that have been classified, the same type of samples are as crowded as possible after being projected. The result of the learning stage is to find a series of coefficients (Coeffcient) to form the shape of

y=a1 * x1 + a2 * x2 + a3 * x3 + ... + an * xn

in:a1,a2,... anYes coefficient,x1,x2,... ,xnis a variable value。

discriminant and threshold value. The discrimination stage can calculate y based on this discriminant formula and classify the samples according to the threshold.

2. Use Fisher to distinguish in R

It is very simple to distinguish using Fisher in R, but I also searched for a long time before I figured out how to use it.

First of all, it is not called Fisher in R. Searching with Fisher is mostly going astray. In R, it is called LDA (Linear Discriminative Analysis).

Secondly, it exists in a package called MASS. Used in Ubuntu 13.10:

sudo apt-get install r-base

This way, the default will be available after installation, and then use the following statement to refer to this package:

> library(MASS)

Again, after referring to the MASS package, you can use the lda command:

> params <- lda(y~x1+x2+x3, data=d)

Among them, the first parameter is the form of a discriminant formula, and the second parameter is the sample data used for training. After the lda command is executed, each coefficient that constitutes the discriminant formula will be output.

Finally, use the predict command to discriminate unclassified samples.

> predict(params, newdata)

Among them, the first parameter is the result of the lda command in the previous stage, and the second parameter is the sample data used to classify. Since then, the entire fisher discrimination process has been completed.

3. Example

3.1 Data

Prepare two csv files, and the classified data used to train is called, and the unclassified data used to distinguish is called. There are six columns in total, and the first rows are Band1, Band2, Band3, Band4, Band5, and Class, respectively representing variables 1, variables 2, variables 3 and categories. It consists of six columns: Band1, Band2, Band3, Band4, Band5. Also the first column contains the column name. The fields of the csv file are separated by commas.

3.2 Operation steps

1. Read

> d <- ("~/data/")
> d2 <- ("~/data/")

2. Training

> lda(Class ~ Band1+Band2+Band3+Band4+Band5, data=d)

Training results:

> params
Call:
lda(Class ~ Band1 + Band2 + Band3 + Band4 + Band5, data = data)

Prior probabilities of groups:
    0     1 
0.4220068 0.5779932 

Group means:
   Band1   Band2   Band3  Band4  Band5
0 318.3189 0.0000000 0.0000000 0.00000 0.00000
1 322.1881 -0.7703634 -0.2642972 33.92608 36.39715

Coefficients of linear discriminants:
       LD1
Band1 0.02173212
Band2 -0.08647688
Band3 -0.01199366
Band4 0.10619769
Band5 0.10560976

3. Distinguish

> ret <- predict(params, d2)

Output result:

> (d2, file="~/data/"

This is the article about how to use Fisher discrimination in R language. For more related Fisher discrimination content in R language, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!