Export: other / legacy
======================
.. role:: raw-html(raw)
:format: html
.. |check| unicode:: U+02713 .. CHECK MARK
.. |cross| unicode:: U+02717 .. BALLOT X
.. |ccheck| replace:: :raw-html:`` |check| :raw-html:``
.. |ccross| replace:: :raw-html:`` |cross| :raw-html:``
::
n2d2 "mnist24_16c4s2_24c5s2_150_10.ini" -export CPP_OpenCL
Export types:
- ``C`` C export using OpenMP;
- ``C_HLS`` C export tailored for HLS with Vivado HLS;
- ``CPP_OpenCL`` C++ export using OpenCL;
- ``CPP_Cuda`` C++ export using Cuda;
- ``CPP_cuDNN`` C++ export using cuDNN;
- ``SC_Spike`` SystemC spike export.
Other program options related to the exports:
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option [default value] | Description |
+==========================+===============================================================================================================================================================================================================================================================================================================+
| ``-nbbits`` [8] | Number of bits for the weights and signals. Must be 8, 16, 32 or 64 for integer export, or -32, -64 for floating point export. The number of bits can be arbitrary for the ``C_HLS`` export (for example, 6 bits). It must be -32 for the ``CPP_TensorRT`` export, the precision is directly set at runtime |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-calib`` [0] | Number of stimuli used for the calibration. 0 = no calibration (default), -1 = use the full test dataset for calibration |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-calib-passes`` [2] | Number of KL passes for determining the layer output values distribution truncation threshold (0 = use the max. value, no truncation) |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-no-unsigned`` | If present, disable the use of unsigned data type in integer exports |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-db-export`` [-1] | Max. number of stimuli to export (0 = no dataset export, -1 = unlimited) |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
C export
~~~~~~~~
Test the exported network:
::
cd export_C_int8
make
./bin/n2d2_test
The result should look like:
::
...
1652.00/1762 (avg = 93.757094%)
1653.00/1763 (avg = 93.760635%)
1654.00/1764 (avg = 93.764172%)
Tested 1764 stimuli
Success rate = 93.764172%
Process time per stimulus = 187.548186 us (12 threads)
Confusion matrix:
-------------------------------------------------
| T \ E | 0 | 1 | 2 | 3 |
-------------------------------------------------
| 0 | 329 | 1 | 5 | 2 |
| | 97.63% | 0.30% | 1.48% | 0.59% |
| 1 | 0 | 692 | 2 | 6 |
| | 0.00% | 98.86% | 0.29% | 0.86% |
| 2 | 11 | 27 | 609 | 55 |
| | 1.57% | 3.85% | 86.75% | 7.83% |
| 3 | 0 | 0 | 1 | 24 |
| | 0.00% | 0.00% | 4.00% | 96.00% |
-------------------------------------------------
T: Target E: Estimated
CPP\_OpenCL export
~~~~~~~~~~~~~~~~~~
The OpenCL export can run the generated program in GPU or CPU
architectures. Compilation features:
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Preprocessor command [default value] | Description |
+========================================+=======================================================================================================================================================================+
| ``PROFILING`` [0] | Compile the binary with a synchronization between each layers and return the mean execution time of each layer. This preprocessor option can decrease performances. |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``GENERATE_KBIN`` [0] | Generate the binary output of the OpenCL kernel .cl file use. The binary is store in the /bin folder. |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``LOAD_KBIN`` [0] | Indicate to the program to load an OpenCL kernel as a binary from the /bin folder instead of a .cl file. |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``CUDA`` [0] | Use the CUDA OpenCL SDK locate at :math:`{/usr/local/cuda}` |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``MALI`` [0] | Use the MALI OpenCL SDK locate at :math:`{/usr/Mali_OpenCL_SDK_vXXX}` |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``INTEL`` [0] | Use the INTEL OpenCL SDK locate at :math:`{/opt/intel/opencl}` |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``AMD`` [1] | Use the AMD OpenCL SDK locate at :math:`{/opt/AMDAPPSDK-XXX}` |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Program options related to the OpenCL export:
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option [default value] | Description |
+==========================+====================================================================================================================================================================+
| ``-cpu`` | If present, force to use a CPU architecture to run the program |
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-gpu`` | If present, force to use a GPU architecture to run the program |
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-batch`` [1] | Size of the batch to use |
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-stimulus`` [NULL] | Path to a specific input stimulus to test. For example: -stimulus :math:`{/stimulus/env0000.pgm}` command will test the file env0000.pgm of the stimulus folder. |
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Test the exported network:
::
cd export_CPP_OpenCL_float32
make
./bin/n2d2_opencl_test -gpu
CPP\_cuDNN export
~~~~~~~~~~~~~~~~~
The cuDNN export can run the generated program in NVIDIA GPU
architecture. It use CUDA and cuDNN library. Compilation features:
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Preprocessor command [default value] | Description |
+========================================+=======================================================================================================================================================================+
| ``PROFILING`` [0] | Compile the binary with a synchronization between each layers and return the mean execution time of each layer. This preprocessor option can decrease performances. |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``ARCH32`` [0] | Compile the binary with the 32-bits architecture compatibility. |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Program options related to the cuDNN export:
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option [default value] | Description |
+==========================+====================================================================================================================================================================+
| ``-batch`` [1] | Size of the batch to use |
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-dev`` [0] | CUDA Device ID selection |
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-stimulus`` [NULL] | Path to a specific input stimulus to test. For example: -stimulus :math:`{/stimulus/env0000.pgm}` command will test the file env0000.pgm of the stimulus folder. |
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Test the exported network:
::
cd export_CPP_cuDNN_float32
make
./bin/n2d2_cudnn_test
C\_HLS export
~~~~~~~~~~~~~
Test the exported network:
::
cd export_C_HLS_int8
make
./bin/n2d2_test
Run the High-Level Synthesis (HLS) with Xilinx Vivado HLS:
::
vivado_hls -f run_hls.tcl
Layer compatibility table
~~~~~~~~~~~~~~~~~~~~~~~~~
Layer compatibility table in function of the export type:
+---------------+------------------------------------------------------+
| Layer | Export type |
| compatibility +----------+-------------+-------------+---------------+
| table | C | C\_HLS | CPP\_OpenCL | CPP\_TensorRT |
+===============+==========+=============+=============+===============+
|Conv | |ccheck| | |ccheck| | |ccheck| | |ccheck| |
+---------------+----------+-------------+-------------+---------------+
|Pool | |ccheck| | |ccheck| | |ccheck| | |ccheck| |
+---------------+----------+-------------+-------------+---------------+
|Fc | |ccheck| | |ccheck| | |ccheck| | |ccheck| |
+---------------+----------+-------------+-------------+---------------+
|Softmax | |ccheck| | |ccross| | |ccheck| | |ccheck| |
+---------------+----------+-------------+-------------+---------------+
|FMP | |ccheck| | |ccross| | |ccheck| | |ccross| |
+---------------+----------+-------------+-------------+---------------+
|Deconv | |ccross| | |ccross| | |ccross| | |ccheck| |
+---------------+----------+-------------+-------------+---------------+
|ElemWise | |ccross| | |ccross| | |ccross| | |ccheck| |
+---------------+----------+-------------+-------------+---------------+
|Resize | |ccheck| | |ccross| | |ccross| | |ccheck| |
+---------------+----------+-------------+-------------+---------------+
|Padding | |ccross| | |ccross| | |ccross| | |ccheck| |
+---------------+----------+-------------+-------------+---------------+
|LRN | |ccross| | |ccross| | |ccross| | |ccheck| |
+---------------+----------+-------------+-------------+---------------+
|Anchor | |ccross| | |ccross| | |ccross| | |ccheck| |
+---------------+----------+-------------+-------------+---------------+
|ObjectDet | |ccross| | |ccross| | |ccross| | |ccheck| |
+---------------+----------+-------------+-------------+---------------+
|ROIPooling | |ccross| | |ccross| | |ccross| | |ccheck| |
+---------------+----------+-------------+-------------+---------------+
|RP | |ccross| | |ccross| | |ccross| | |ccheck| |
+---------------+----------+-------------+-------------+---------------+
BatchNorm is not mentionned because batch normalization parameters are
automatically fused with convolutions parameters with the command
“-fuse”.