Export: other / legacy

n2d2 "mnist24_16c4s2_24c5s2_150_10.ini" -export CPP_OpenCL

Export types:

  • C C export using OpenMP;

  • C_HLS C export tailored for HLS with Vivado HLS;

  • CPP_OpenCL C++ export using OpenCL;

  • CPP_Cuda C++ export using Cuda;

  • CPP_cuDNN C++ export using cuDNN;

  • SC_Spike SystemC spike export.

Other program options related to the exports:

Option [default value]

Description

-nbbits [8]

Number of bits for the weights and signals. Must be 8, 16, 32 or 64 for integer export, or -32, -64 for floating point export. The number of bits can be arbitrary for the C_HLS export (for example, 6 bits). It must be -32 for the CPP_TensorRT export, the precision is directly set at runtime

-calib [0]

Number of stimuli used for the calibration. 0 = no calibration (default), -1 = use the full test dataset for calibration

-calib-passes [2]

Number of KL passes for determining the layer output values distribution truncation threshold (0 = use the max. value, no truncation)

-no-unsigned

If present, disable the use of unsigned data type in integer exports

-db-export [-1]

Max. number of stimuli to export (0 = no dataset export, -1 = unlimited)

C export

Test the exported network:

cd export_C_int8
make
./bin/n2d2_test

The result should look like:

...
1652.00/1762    (avg = 93.757094%)
1653.00/1763    (avg = 93.760635%)
1654.00/1764    (avg = 93.764172%)
Tested 1764 stimuli
Success rate = 93.764172%
Process time per stimulus = 187.548186 us (12 threads)

Confusion matrix:
-------------------------------------------------
| T \ E |       0 |       1 |       2 |       3 |
-------------------------------------------------
|     0 |     329 |       1 |       5 |       2 |
|       |  97.63% |   0.30% |   1.48% |   0.59% |
|     1 |       0 |     692 |       2 |       6 |
|       |   0.00% |  98.86% |   0.29% |   0.86% |
|     2 |      11 |      27 |     609 |      55 |
|       |   1.57% |   3.85% |  86.75% |   7.83% |
|     3 |       0 |       0 |       1 |      24 |
|       |   0.00% |   0.00% |   4.00% |  96.00% |
-------------------------------------------------
T: Target    E: Estimated

CPP_OpenCL export

The OpenCL export can run the generated program in GPU or CPU architectures. Compilation features:

Preprocessor command [default value]

Description

PROFILING [0]

Compile the binary with a synchronization between each layers and return the mean execution time of each layer. This preprocessor option can decrease performances.

GENERATE_KBIN [0]

Generate the binary output of the OpenCL kernel .cl file use. The binary is store in the /bin folder.

LOAD_KBIN [0]

Indicate to the program to load an OpenCL kernel as a binary from the /bin folder instead of a .cl file.

CUDA [0]

Use the CUDA OpenCL SDK locate at \({/usr/local/cuda}\)

MALI [0]

Use the MALI OpenCL SDK locate at \({/usr/Mali_OpenCL_SDK_vXXX}\)

INTEL [0]

Use the INTEL OpenCL SDK locate at \({/opt/intel/opencl}\)

AMD [1]

Use the AMD OpenCL SDK locate at \({/opt/AMDAPPSDK-XXX}\)

Program options related to the OpenCL export:

Option [default value]

Description

-cpu

If present, force to use a CPU architecture to run the program

-gpu

If present, force to use a GPU architecture to run the program

-batch [1]

Size of the batch to use

-stimulus [NULL]

Path to a specific input stimulus to test. For example: -stimulus \({/stimulus/env0000.pgm}\) command will test the file env0000.pgm of the stimulus folder.

Test the exported network:

cd export_CPP_OpenCL_float32
make
./bin/n2d2_opencl_test -gpu

CPP_cuDNN export

The cuDNN export can run the generated program in NVIDIA GPU architecture. It use CUDA and cuDNN library. Compilation features:

Preprocessor command [default value]

Description

PROFILING [0]

Compile the binary with a synchronization between each layers and return the mean execution time of each layer. This preprocessor option can decrease performances.

ARCH32 [0]

Compile the binary with the 32-bits architecture compatibility.

Program options related to the cuDNN export:

Option [default value]

Description

-batch [1]

Size of the batch to use

-dev [0]

CUDA Device ID selection

-stimulus [NULL]

Path to a specific input stimulus to test. For example: -stimulus \({/stimulus/env0000.pgm}\) command will test the file env0000.pgm of the stimulus folder.

Test the exported network:

cd export_CPP_cuDNN_float32
make
./bin/n2d2_cudnn_test

C_HLS export

Test the exported network:

cd export_C_HLS_int8
make
./bin/n2d2_test

Run the High-Level Synthesis (HLS) with Xilinx Vivado HLS:

vivado_hls -f run_hls.tcl

Layer compatibility table

Layer compatibility table in function of the export type:

Layer compatibility table

Export type

C

C_HLS

CPP_OpenCL

CPP_TensorRT

Conv

Pool

Fc

Softmax

FMP

Deconv

ElemWise

Resize

Padding

LRN

Anchor

ObjectDet

ROIPooling

RP

BatchNorm is not mentionned because batch normalization parameters are automatically fused with convolutions parameters with the command “-fuse”.