SoFunction
Updated on 2025-03-01

Operation of R language factor type numerical to numerical type

I have always thought that as long as it is a number, no matter what type it is, it can be converted into a corresponding numeric type number through the() function, for example

x<-"123", x is type character, and (x) is type 123 of numeric.

But the factor type is different.

a<-factor(c(100,200,300,301,302,400,10)), their values ​​are 100 200 300 301 302 400 10, however

(a) The corresponding value is not 100 200 300 301 302 400 10, but 2 3 4 5 6 7 1.

The rules for converting factor into numeric types are as follows:

There are n numbers in total, so the converted number will take the value in 1-n, the smallest number will take one, the smallest number will take two, and so on.

So how do you make the numerical type convert the corresponding numerical type in the factor type?

     mean(((factorname)))
     mean((levels(factorname)[factorname]))

All the above codes can implement the corresponding numerical type conversion of the numerical type in the factor type. The idea is to convert it into a character type first and then to a numerical type.

Supplementary: The solution to the R_as.numeric() function returns meaningless results when converting decimals

This article focuses on solving the meaningless results obtained by converting factor into numerical values.

Suppose there is a data frame aaa

x   |    y     |       value 
------------------------------------------
a1      b2        0.510665432157769
a2      b3        0.887655678543227
..      ..              ...

Run (aaa[1,3]), the result returned is actually 123? Of course, this is just a chestnut, to express the problem. The result that does not necessarily get when running is 123

Find the following reference in R help

Warning

If x is a factor, will return the underlying numeric (integer) representation, which is often meaningless as it may not correspond to the factor levels, see the ‘Warning' section in factor (and the 2nd example below).

Have you seen the word meaningless? It means that if you use converting a factor, it will usually return a meaningless result, which is a "integrated number" of the factor, because in order to save memory and improve speed, the underlying factor type is implemented using int in C language, and the value of the factor and the "conversion table" of the integer value are stored in memory.

So how do we solve it?

Nested use

((aaa[1,3]))

This returns the value that should be converted normally, rather than a meaningless result such as 123.

But there is a problem with the above method. When the number of digits after the decimal point is too large, the output will be rounded.

For this, we can use the parameters digit in the print() function, print(((aaa[1,3])), digits = 16), and the output is the complete value without rounding

In theory, format(xx, digits = 16) can also guarantee length, but I didn't try it in this case.

The above is personal experience. I hope you can give you a reference and I hope you can support me more. If there are any mistakes or no complete considerations, I would like to give you advice.