R language multi-threaded operation (solves the problem of slow R loop)

I haven't updated my blog for half a year. . Recently I have been writing analysis reports without R for half a year

This time I recorded the problem of slow death and slow death of R loops (above millions) last year. I encountered this problem last year. I tried many threads at that time, but failed... I tried yesterday and finally ran through, and the process was quite smooth.

step1

First check the number of cores of your computer. It seems that n cores should be selected to run n threads. The more threads, the better. The number of threads and task running time are parabolas with an opening downward opening, and the highest point is expected to be on the number of cores of the computer.

detectCores( ) checks the current number of available cores for the computer My is 4 so the step2 selected is 4

library(parallel)
 <- detectCores()

step 2

Multithreaded computing

setwd("C:\\Users\\siyuanmao\\Documents\\imdada\\0-Channel delivery and new coupon linkage model\\Measurement")
options(scipen=3)  ##Cancel scientific notationchannel_ad_ios_data&lt;-seq(0,50000,5000)
channel_ad_android_data&lt;-seq(0,100000,10000)
library(parallel)
func &lt;- function(n){#n=1
  result_data&lt;-("Coupon Scheme.csv",stringsAsFactors=FALSE)
  total_coupon_solution_data&lt;-("Result Table Framework.csv",stringsAsFactors=FALSE)
  coupon_solution_data&lt;-subset(result_data,solution== paste('plan',n,sep=""))
  
  for (i in 1:11){#i=3
    coupon_solution_data$channel_ad_cost[3]&lt;-5000*(i-1)
    
    for (j in 1:11){#j=5
      coupon_solution_data$channel_ad_cost[4]&lt;-10000*(j-1)
      solution_mark&lt;-paste('plan',n,i,j,sep="-")
      coupon_solution_data$solution&lt;-solution_mark
      
      total_coupon_solution_data&lt;-rbind(total_coupon_solution_data,coupon_solution_data)
    }
  }
  print(solution_mark)
  return(total_coupon_solution_data)
}
#func(10)
({
x &lt;- 1:7776
cl &lt;- makeCluster(4) # Initialize the four-core clusterresults &lt;- parLapply(cl,x,func) # parallel version of lapply &lt;- ('rbind',results) # Integration resultsstopCluster(cl) # Close the cluster})
df=()

It turns out that when I was not multi-threaded, I expected to run for more than 12 hours. The computer made a whirring sound. After checking the Python loop, it would be faster, and then it changed to the python version (it hasn't been used for a long time, and I can't even write a range. I have been exploring for more than a day before I can make it correct, but the speed is still slow ==). So I changed it to multi-threaded and the result will be obtained after 25 minutes of running.

Supplement: R language multi-threading

parallel package

Package installation

("parallel")
library(parallel)

Common functions in packages

detectCores() Check the current number of available cores

clusterExport() configures the current environment

makeCluster() allocates the number of cores

stopCluster() Close the cluster

Parallel version of parLapply() laply() function

In fact, R language is originally a vectorized language. If it is for an operation of a vector, using the apply function family can achieve relatively high efficiency. Compared with for loops, this efficiency comes from:

Implementing the for loop with C

Reduce unnecessary copies of data structures, etc.

But many times, if you want to be faster, the apply function family alone is not enough, so you can use multiple threads at this time.

The R language parallel package can help implement multi-threading.

parLapply's simple code combat

Check the current number of audits

 &lt;- detectCores()
#result&gt; 
[1] 8

Start the cluster and shut down the cluster

cl &lt;- makeCluster(4) # Initialize the four-core cluster###Parallel TasksstopCluster(cl) # Close the cluster

parLapply performs multi-threaded calculation

#Define the calculation squared functionsquare &lt;- function(x)
{
    return(x^2)
}

#Calculate squared functions using parallel calculationsnum &lt;- c(1:3)
cl &lt;- makeCluster(4) # Initialize the four-core clusterresults &lt;- parLapply（cl,num,square）#Calling parLapply squared functionfinal &lt;- ('c',results)#Integration ResultsstopCluster(cl) # Close the cluster#result&gt; final
[1] 1,4,9

Thinking: In such a small calculation method, is it faster to open 4 cores to calculate than to open one core?

Answer: Of course, it is not necessarily true, because it involves additional overhead such as scheduling methods, so it is not necessarily fast, because the real parallel operation lies in the calculation of large data volumes.

Time overhead comparison

Two comparative codes

#Define the calculation squared functionsquare &lt;- function(x)
{
   #########
   #A piece of redundant code increases execution time    y = 2*x
    if(y &lt;300)
    {z = y}
    else
    {z = x}
   ##########   
    return(x^2)
}
num &lt;- c(1:10000000)

#parallel computingprint(({
    cl &lt;- makeCluster(4) # Initialize the four-core cluster    results &lt;- parLapply（cl,num,square）#Calling parLapply squared functionfinal &lt;- ('c',results)#Integration ResultsstopCluster(cl) # Close the cluster}))
#resultuser  system  Passing 
 7.89  0.27 19.01

#Normal calculationprint(({
    results &lt;- lapply（num,square）
    final &lt;- ('c',results)#Integration Results}))
#resultuser  system  Passing 
29.74  0.00 29.79

Obviously, when the data volume is relatively large, the time of parallel calculation is almost inversely proportional to the number of cores. However, it is not enough to open a few more cores. Note that memory is easy to overrun. Each core allocates the corresponding memory, so pay attention to memory overhead. When memory problems occur, you need to check whether the code is reasonable, whether the R language version (64-bit memory is larger than 32-bit memory allocated), and whether the kernel allocation is reasonable.

Introduction of variables in the previous level environment

There are interesting definitions of environment variables in R language, one layer after another, and we will not go into it in depth here.

Similar to using global variables in C language functions, when R performs parallel calculations, if the function that needs to be calculated appears globally (the previous level), then it is necessary to declare that the variable is introduced, otherwise an error will be reported.

#Define the calculation power functionbase = 2
square &lt;- function(x)
{
    return(x^base)
}
num &lt;- c(1:1000000)

#Calculate power functions using parallel calculationscl &lt;- makeCluster(4) # Initialize the four-core clusterresults &lt;- parLapply（cl,num,square）#Calling parLapply squared functionfinal &lt;- ('c',results)#Integration ResultsstopCluster(cl) # Close the cluster#Result errorError in checkForRemoteErrors(val) : 
  4 nodes produced errors; first error: The object cannot be found'base'

#Calculate power functions using parallel calculationscl &lt;- makeCluster(4) # Initialize the four-core clusterclusterExport(cl,"base",envir = environment())
results &lt;- parLapply（cl,num,square）#Calling parLapply squared functionfinal &lt;- ('c',results)#Integration ResultsstopCluster(cl) # Close the cluster#result&gt; final
[1] 1,4,9,16,25.......

foreach package

In addition to the parallel package, there are also foreach packages for parallel for loops. The use of foreach() is similar to parLapply(), and the two functions are also similar, and the problems encountered are also similar.

Package installation

("foreach")
library(parallel)

Use of foreach

#Define the calculation power functionsquare &lt;- function(x)
{
    return(x^2)
}

Use of non-parallel situations:

The combine in the parameter is a function that integrates the result, which can be c, rbind, or +, etc.

results = foreach(x = c(1:3),.combine = 'c') %do% square(x)
#result&gt; results
[1] 1,4,9

Use of parallel situations:

When paying attention to parallelism, it is necessary to cooperate with the parallel package to introduce library (doParallel). At the same time, %do% needs to be changed to %dopar%. In addition, unlike the parallel package, you need to add a more registerDoParallel(cl) to register the kernel for use.

cl <- makeCluster(4)
registerDoParallel(cl)
results = foreach(x = c(1:100000),.combine = 'c') %dopar% square(x)
stopCluster(cl)

Introduction of variables in the previous level environment

Before parallel computing with the parallel package, clusterExport() is required to introduce global variables, foreach also needs to be declared. The difference is that the foreach declaration method is directly written in the parameter export of foreach().

#Define the calculation power functionbase = 2
square &lt;- function(x)
{
    return(x^base)
}
cl &lt;- makeCluster(4)
registerDoParallel(cl)
results = foreach(x = c(1:100000),.combine = 'c',.export ='base' ) %dopar% square(x)
stopCluster(cl)

The above is personal experience. I hope you can give you a reference and I hope you can support me more. If there are any mistakes or no complete considerations, I would like to give you advice.