C++ Extension Implementation in PyTorch

Today we're going to talk about C++ extensions with PyTorch.

Before we get started, we need to understand how PyTorch can customize modules, most commonly by inheriting from Python and using PyTorch's existing operators to assemble our own modules. This is easy to do, but it may not be the best way to achieve computational efficiency, and if the functionality we want to achieve is too complex, we may not be able to use the existing functions in PyTorch to meet our needs. In this case, extending PyTorch modules with C, C++, or CUDA is the best choice.

Since most of the deep learning systems on the market (TensorFlow, PyTorch, etc.) are based on C, C++ backends, there are basically extension interfaces for these systems in C, C++. PyTorch is based on Torch, which uses the C language at the bottom, so PyTorch is inherently C-compatible. PyTorch is built on Torch, which uses C as its underlying language. With the release of PyTorch 1.0, officials have begun to consider replacing PyTorch's underlying code with caffe2, so they are also gradually refactoring ATen, the C++ extension library currently used by PyTorch. Overall, C++ is the future. As for CUDA, it's the tool that almost all deep learning systems are built with, so CUDA extension interfaces are standard.

This article uses a simple example to sort out the steps for making C++ extensions, and does not go into depth as far as some of the specific implementations are concerned.

C, C++, CUDA extensions for PyTorch

For PyTorch's C extensions, see theofficial tutorialOr this blog post, which is not difficult to do, except with the help of the original Torch-provided<TH/>cap (a poem)<THC/>and other interfaces, and then utilize the PyTorchmodule to extend it. Note that as PyTorch versions are upgraded, this practice may not work in newer versions of PyTorch.

This article focuses on extension methods for C++ (with the possible addition of CUDA in the future).

C++ extensions

First, an introduction to the basic process. Extending C++/CUDA in PyTorch is divided into several main steps:

Install the pybind11 module (via pip or conda, etc.), which takes care of the binding between python and C++;
Write the customization layer in C++, including forward propagation FORWARD and backward propagation BACKWARD;
Write and compile and load the C++ code using the python setuptools.
Compile and install, call the C++ extension interface in python.

Next, we'll demonstrate these steps with a simple example (z = 2x + y).

initial step

Installing pybind11 is relatively simple, so just skip it. Let's start by writing the C++-related files:

header file

#include <torch/>
#include <vector>

// Forward propagation
torch::Tensor Test_forward_cpu(const torch::Tensor& inputA,
              const torch::Tensor& inputB);
// Reverse propagation
std::vector<torch::Tensor> Test_backward_cpu(const torch::Tensor& gradOutput);

Note that the <torch/> header file referenced here is crucial, and consists of three important modules:

pybind11 for C++ and python interaction;
ATen, which contains important functions and classes such as Tensor;
A few helper header files to enable interaction between ATen and pybind11.

The source file is shown below:

#include ""

// Forward propagation, summing two Tensors. We'll focus on the C++ extension of the process here, and won't delve into the implementation.
torch::Tensor Test_forward_cpu(const torch::Tensor& x,
              const torch::Tensor& y) {
  AT_ASSERTM(() == (), "x must be the same size as y");
  torch::Tensor z = torch::zeros(());
  z = 2 * x + y;
  return z;
}

// Reverse propagation
// In this example, the derivative of z with respect to x is 2 and the derivative of z with respect to y is 1.
// As for why the interface (parameters, return value) of this backward function is designed this way, I'll talk about that later.
std::vector<torch::Tensor> Test_backward_cpu(const torch::Tensor& gradOutput) {
  torch::Tensor gradOutputX = 2 * gradOutput * torch::ones(());
  torch::Tensor gradOutputY = gradOutput * torch::ones(());
  return {gradOutputX, gradOutputY};
}

// pybind11 binding
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
 ("forward", &Test_forward_cpu, "TEST forward");
 ("backward", &Test_backward_cpu, "TEST backward");
}

second step

Create a new configuration file for compiling and installing, and arrange the file directory as follows:

└── csrc
  ├── cpu
  │  ├── 
  │  └── 
  └──

The following is the content of the

from setuptools import setup
import os
import glob
from .cpp_extension import BuildExtension, CppExtension

# Directory of header files
include_dirs = ((__file__))
# Source code catalog
source_cpu = ((include_dirs, 'cpu', '*.cpp'))

setup(
  name='test_cpp', # module name, need to be called in python
  version="0.1",
  ext_modules=[
    CppExtension('test_cpp', sources=source_cpu, include_dirs=[include_dirs]),
  ],
  cmdclass={
    'build_ext': BuildExtension
  }
)

Note that this C++ extension is named test_cpp, meaning that C++ functions can be called in python via the test_cpp module.

third step

In the cpu directory, execute the following command to compile and install the C++ code:

python  install

After that, you can see a bunch of output that the C++ module will be installed in python's site-packages.

Once the above steps are complete, you can call the C++ code in python. In PyTorch, it is customary to first encapsulate the C++ forward and backpropagation into a function op (the following code is in the file):

from  import Function

import test_cpp

class TestFunction(Function):

  @staticmethod
  def forward(ctx, x, y):
    return test_cpp.forward(x, y)

  @staticmethod
  def backward(ctx, gradOutput):
    gradX, gradY = test_cpp.backward(gradOutput)
    return gradX, gradY

In this way, we are embedding C++-extended functions within PyTorch's own framework.

I looked at the code for this Function class and found it to be something quite interesting:

class Function(with_metaclass(FunctionMeta, _C._FunctionBase, _ContextMethodMixin, _HookMixin)):
 
  ...

  @staticmethod
  def forward(ctx, *args, **kwargs):
    r"""Performs the operation.

    This function is to be overridden by all subclasses.

    It must accept a context ctx as the first argument, followed by any
    number of arguments (tensors or other types).

    The context can be used to store tensors that can be then retrieved
    during the backward pass.
    """
    raise NotImplementedError

  @staticmethod
  def backward(ctx, *grad_outputs):
    r"""Defines a formula for differentiating the operation.

    This function is to be overridden by all subclasses.

    It must accept a context :attr:`ctx` as the first argument, followed by
    as many outputs did :func:`forward` return, and it should return as many
    tensors, as there were inputs to :func:`forward`. Each argument is the
    gradient  the given output, and each returned value should be the
    gradient . the corresponding input.

    The context can be used to retrieve tensors saved during the forward
    pass. It also has an attribute :attr:`ctx.needs_input_grad` as a tuple
    of booleans representing whether each input needs gradient. .,
    :func:`backward` will have ``ctx.needs_input_grad[0] = True`` if the
    first input to :func:`forward` needs gradient computated . the
    output.
    """
    raise NotImplementedError

It is important to note the rules of the implementation of forward here. The interface contains two parameters: ctx is an auxiliary environment variable, and grad_outputs is a list of gradients from the previous layer of the network, and the number of this gradient list is the same as the number of parameters returned by the forward function, which is also in line with the principle of the chaining law, because the chaining law then needs to multiply or add all the relevant gradients in the previous layer with the current layer. At the same time, the backward needs to return the gradient of each input parameter in the forward, if the forward includes n parameters, it needs to return n gradients one by one. So, in the above example, our backward function receives one parameter as input (forward outputs only one variable) and returns two gradients (forward receives two input variables from the previous layer).

After defining Function, you can use this custom op in Module:

import torch

class Test():

  def __init__(self):
    super(Test, self).__init__()

  def forward(self, inputA, inputB):
    return (inputA, inputB)

Now, our file directory becomes:

├── csrc
│  ├── cpu
│  │  ├── 
│  │  └── 
│  └── 
└──

After that, we can call it as a normal PyTorch module.

beta (software)

Below, we test forward and backward propagation:

import torch
from  import Variable

from test import Test

x = Variable(([1,2,3]), requires_grad=True)
y = Variable(([4,5,6]), requires_grad=True)
test = Test()
z = test(x, y)
().backward()
print('x: ', x)
print('y: ', y)
print('z: ', z)
print(': ', )
print(': ', )

The output is as follows:

x: tensor([1., 2., 3.], requires_grad=True)
y: tensor([4., 5., 6.], requires_grad=True)
z: tensor([ 6., 9., 12.], grad_fn=<TestFunctionBackward>)
: tensor([2., 2., 2.])
: tensor([1., 1., 1.])

It can be seen that the forward propagation satisfies z = 2x + y, and the backpropagation result is expected.

CUDA extensions

Although code written in C++ can be run directly on the GPU, it still doesn't perform as well as code written directly in CUDA - after all, ATen doesn't know how to optimize the performance of its algorithms. However, since I still don't know anything about CUDA, I'm going to have to skip this step for now and add to it later.

consultation

CUSTOM C EXTENSIONS FOR PYTORCH
CUSTOM C++ AND CUDA EXTENSIONS
Pytorch extension advanced (I): Pytorch combined with C and the Cuda language
Pytorch extension advanced (2): Pytorch combined with C++ and Cuda extension

This article on the implementation of PyTorch C++ extensions to this article, more related to PyTorch C++ extensions content please search for my previous articles or continue to browse the following related articles I hope you will support me in the future!