Python optimization skills: Using ctypes to improve execution speed

First, I will share with you a small pitfall that I encountered when calling the C library using python ctypes

The problem this time is a C function, and the return value is the string address generated by malloc. It's okay to use it normally, and I have used it for a while and I haven't found any abnormalities.

During the test this time, it was found that a "segment fault" would occur when using this process, causing the program to exit.

After troubleshooting, it was determined that the cause of the problem was the return value of the C function, and the default function return type of ctypes is the int type.

You need to set the return type in use, for example:

= c_char_p

Let's discuss the tips for using ctypes in detail

The ctypes library allows developers to develop with the help of C language. This interface that introduces C language can help us do many things, such as some minor problems that need to call C code to improve performance. Through it, you can access the .6 library on Windows systems, as well as the .6 library on Linux systems. Of course, you can also use your own compiled shared library

Let’s first look at a simple example. We use Python to find prime numbers within 1000000, repeat this process 10 times, and calculate the run time.

import math
from timeit import timeit


def check_prime(x):
  values = xrange(2, int((x)) + 1)
  for i in values:
    if x % i == 0:
      return False
  return True


def get_prime(n):
  return [x for x in xrange(2, n) if check_prime(x)]

print timeit(stmt='get_prime(1000000)', setup='from __main__ import get_prime',
       number=10)

Output

42.8259568214

The following is a check_prime function written in C language, and then import it as a shared library (dynamic link library)

#include <>
#include <>
int check_prime(int a)
{
  int c;
  for ( c = 2 ; c <= sqrt(a) ; c++ ) {
    if ( a%c == 0 )
      return 0;
  }
  return 1;
}

Use the following command to generate .so (shared object) file

gcc -shared -o -fPIC

import ctypes
import math
from timeit import timeit
check_prime_in_c = ('./').check_prime


def check_prime_in_py(x):
  values = xrange(2, int((x)) + 1)
  for i in values:
    if x % i == 0:
      return False
  return True


def get_prime_in_c(n):
  return [x for x in xrange(2, n) if check_prime_in_c(x)]


def get_prime_in_py(n):
  return [x for x in xrange(2, n) if check_prime_in_py(x)]


py_time = timeit(stmt='get_prime_in_py(1000000)', setup='from __main__ import get_prime_in_py',
         number=10)
c_time = timeit(stmt='get_prime_in_c(1000000)', setup='from __main__ import get_prime_in_c',
        number=10)
print "Python version: {} seconds".format(py_time)

print "C version: {} seconds".format(c_time)

Output

Python version: 43.4539749622 seconds
C version: 8.56250786781 seconds

We can see a very obvious performance gap. Here are more ways to judge whether a number is a prime number.

Let's take a look at another example of complexityQuick sort

#include <>

typedef struct _Range {
  int start, end;
} Range;

Range new_Range(int s, int e) {
  Range r;
   = s;
   = e;
  return r;
}

void swap(int *x, int *y) {
  int t = *x;
  *x = *y;
  *y = t;
}

void quick_sort(int arr[], const int len) {
  if (len <= 0)
    return;
  Range r[len];
  int p = 0;
  r[p++] = new_Range(0, len - 1);
  while (p) {
    Range range = r[--p];
    if ( >= )
      continue;
    int mid = arr[];
    int left = , right =  - 1;
    while (left < right) {
      while (arr[left] < mid && left < right)
        left++;
      while (arr[right] >= mid && left < right)
        right--;
      swap(&arr[left], &arr[right]);
    }
    if (arr[left] >= arr[])
      swap(&arr[left], &arr[]);
    else
      left++;
    r[p++] = new_Range(, left - 1);
    r[p++] = new_Range(left + 1, );
  }
}

gcc -shared -o -fPIC

There is a troublesome point in using ctypes, that the type used by native C code may not be clearly corresponding to Python. For example, what is an array in Python here? List? Or an array in the array module. So we need to convert

import ctypes
import time
import random

quick_sort = ('./').quick_sort
nums = []
for _ in range(100):
  r = [(1, 100000000) for x in xrange(100000)]
  arr = (ctypes.c_int * len(r))(*r)
  ((arr, len(r)))

init = ()
for i in range(100):
  quick_sort(nums[i][0], nums[i][1])
print "%s" % (() - init)

Output

1.874907

Compare with the sort method of Python list

import ctypes
import time
import random

quick_sort = ('./').quick_sort
nums = []
for _ in range(100):
  ([(1, 100000000) for x in xrange(100000)])

init = ()
for i in range(100):
  nums[i].sort()
print "%s" % (() - init)

Output

2.501257

As for the structure, you need to define a class that contains the corresponding fields and types

class Point():
  _fields_ = [('x', ctypes.c_double),
        ('y', ctypes.c_double)]

In addition to importing the C language extension files we wrote ourselves, we can also directly import the library files provided by the system, such as the implementation of the C standard library under Linux glibc

import time
import random
from ctypes import cdll
libc = ('.6') # Linux system# libc = # Windows systeminit = ()
randoms = [(1, 100) for x in xrange(1000000)]
print "Python version: %s seconds" % (() - init)
init = ()
randoms = [(() % 100) for x in xrange(1000000)]
print "C version : %s seconds" % (() - init)

Output

Python version: 0.850172 seconds
C version : 0.27645 seconds

The above are all basic skills of ctypes, which are basically enough for ordinary developers.

For more detailed instructions, please refer to:/library/