Export: other / legacy ====================== .. role:: raw-html(raw) :format: html .. |check| unicode:: U+02713 .. CHECK MARK .. |cross| unicode:: U+02717 .. BALLOT X .. |ccheck| replace:: :raw-html:`` |check| :raw-html:`` .. |ccross| replace:: :raw-html:`` |cross| :raw-html:`` :: n2d2 "mnist24_16c4s2_24c5s2_150_10.ini" -export CPP_OpenCL Export types: - ``C`` C export using OpenMP; - ``C_HLS`` C export tailored for HLS with Vivado HLS; - ``CPP_OpenCL`` C++ export using OpenCL; - ``CPP_Cuda`` C++ export using Cuda; - ``CPP_cuDNN`` C++ export using cuDNN; - ``SC_Spike`` SystemC spike export. Other program options related to the exports: +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Option [default value] | Description | +==========================+===============================================================================================================================================================================================================================================================================================================+ | ``-nbbits`` [8] | Number of bits for the weights and signals. Must be 8, 16, 32 or 64 for integer export, or -32, -64 for floating point export. The number of bits can be arbitrary for the ``C_HLS`` export (for example, 6 bits). It must be -32 for the ``CPP_TensorRT`` export, the precision is directly set at runtime | +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``-calib`` [0] | Number of stimuli used for the calibration. 0 = no calibration (default), -1 = use the full test dataset for calibration | +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``-calib-passes`` [2] | Number of KL passes for determining the layer output values distribution truncation threshold (0 = use the max. value, no truncation) | +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``-no-unsigned`` | If present, disable the use of unsigned data type in integer exports | +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``-db-export`` [-1] | Max. number of stimuli to export (0 = no dataset export, -1 = unlimited) | +--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ C export ~~~~~~~~ Test the exported network: :: cd export_C_int8 make ./bin/n2d2_test The result should look like: :: ... 1652.00/1762 (avg = 93.757094%) 1653.00/1763 (avg = 93.760635%) 1654.00/1764 (avg = 93.764172%) Tested 1764 stimuli Success rate = 93.764172% Process time per stimulus = 187.548186 us (12 threads) Confusion matrix: ------------------------------------------------- | T \ E | 0 | 1 | 2 | 3 | ------------------------------------------------- | 0 | 329 | 1 | 5 | 2 | | | 97.63% | 0.30% | 1.48% | 0.59% | | 1 | 0 | 692 | 2 | 6 | | | 0.00% | 98.86% | 0.29% | 0.86% | | 2 | 11 | 27 | 609 | 55 | | | 1.57% | 3.85% | 86.75% | 7.83% | | 3 | 0 | 0 | 1 | 24 | | | 0.00% | 0.00% | 4.00% | 96.00% | ------------------------------------------------- T: Target E: Estimated CPP\_OpenCL export ~~~~~~~~~~~~~~~~~~ The OpenCL export can run the generated program in GPU or CPU architectures. Compilation features: +----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Preprocessor command [default value] | Description | +========================================+=======================================================================================================================================================================+ | ``PROFILING`` [0] | Compile the binary with a synchronization between each layers and return the mean execution time of each layer. This preprocessor option can decrease performances. | +----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``GENERATE_KBIN`` [0] | Generate the binary output of the OpenCL kernel .cl file use. The binary is store in the /bin folder. | +----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``LOAD_KBIN`` [0] | Indicate to the program to load an OpenCL kernel as a binary from the /bin folder instead of a .cl file. | +----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``CUDA`` [0] | Use the CUDA OpenCL SDK locate at :math:`{/usr/local/cuda}` | +----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``MALI`` [0] | Use the MALI OpenCL SDK locate at :math:`{/usr/Mali_OpenCL_SDK_vXXX}` | +----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``INTEL`` [0] | Use the INTEL OpenCL SDK locate at :math:`{/opt/intel/opencl}` | +----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``AMD`` [1] | Use the AMD OpenCL SDK locate at :math:`{/opt/AMDAPPSDK-XXX}` | +----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Program options related to the OpenCL export: +--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Option [default value] | Description | +==========================+====================================================================================================================================================================+ | ``-cpu`` | If present, force to use a CPU architecture to run the program | +--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``-gpu`` | If present, force to use a GPU architecture to run the program | +--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``-batch`` [1] | Size of the batch to use | +--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``-stimulus`` [NULL] | Path to a specific input stimulus to test. For example: -stimulus :math:`{/stimulus/env0000.pgm}` command will test the file env0000.pgm of the stimulus folder. | +--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Test the exported network: :: cd export_CPP_OpenCL_float32 make ./bin/n2d2_opencl_test -gpu CPP\_cuDNN export ~~~~~~~~~~~~~~~~~ The cuDNN export can run the generated program in NVIDIA GPU architecture. It use CUDA and cuDNN library. Compilation features: +----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Preprocessor command [default value] | Description | +========================================+=======================================================================================================================================================================+ | ``PROFILING`` [0] | Compile the binary with a synchronization between each layers and return the mean execution time of each layer. This preprocessor option can decrease performances. | +----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``ARCH32`` [0] | Compile the binary with the 32-bits architecture compatibility. | +----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Program options related to the cuDNN export: +--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Option [default value] | Description | +==========================+====================================================================================================================================================================+ | ``-batch`` [1] | Size of the batch to use | +--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``-dev`` [0] | CUDA Device ID selection | +--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``-stimulus`` [NULL] | Path to a specific input stimulus to test. For example: -stimulus :math:`{/stimulus/env0000.pgm}` command will test the file env0000.pgm of the stimulus folder. | +--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Test the exported network: :: cd export_CPP_cuDNN_float32 make ./bin/n2d2_cudnn_test C\_HLS export ~~~~~~~~~~~~~ Test the exported network: :: cd export_C_HLS_int8 make ./bin/n2d2_test Run the High-Level Synthesis (HLS) with Xilinx Vivado HLS: :: vivado_hls -f run_hls.tcl Layer compatibility table ~~~~~~~~~~~~~~~~~~~~~~~~~ Layer compatibility table in function of the export type: +---------------+------------------------------------------------------+ | Layer | Export type | | compatibility +----------+-------------+-------------+---------------+ | table | C | C\_HLS | CPP\_OpenCL | CPP\_TensorRT | +===============+==========+=============+=============+===============+ |Conv | |ccheck| | |ccheck| | |ccheck| | |ccheck| | +---------------+----------+-------------+-------------+---------------+ |Pool | |ccheck| | |ccheck| | |ccheck| | |ccheck| | +---------------+----------+-------------+-------------+---------------+ |Fc | |ccheck| | |ccheck| | |ccheck| | |ccheck| | +---------------+----------+-------------+-------------+---------------+ |Softmax | |ccheck| | |ccross| | |ccheck| | |ccheck| | +---------------+----------+-------------+-------------+---------------+ |FMP | |ccheck| | |ccross| | |ccheck| | |ccross| | +---------------+----------+-------------+-------------+---------------+ |Deconv | |ccross| | |ccross| | |ccross| | |ccheck| | +---------------+----------+-------------+-------------+---------------+ |ElemWise | |ccross| | |ccross| | |ccross| | |ccheck| | +---------------+----------+-------------+-------------+---------------+ |Resize | |ccheck| | |ccross| | |ccross| | |ccheck| | +---------------+----------+-------------+-------------+---------------+ |Padding | |ccross| | |ccross| | |ccross| | |ccheck| | +---------------+----------+-------------+-------------+---------------+ |LRN | |ccross| | |ccross| | |ccross| | |ccheck| | +---------------+----------+-------------+-------------+---------------+ |Anchor | |ccross| | |ccross| | |ccross| | |ccheck| | +---------------+----------+-------------+-------------+---------------+ |ObjectDet | |ccross| | |ccross| | |ccross| | |ccheck| | +---------------+----------+-------------+-------------+---------------+ |ROIPooling | |ccross| | |ccross| | |ccross| | |ccheck| | +---------------+----------+-------------+-------------+---------------+ |RP | |ccross| | |ccross| | |ccross| | |ccheck| | +---------------+----------+-------------+-------------+---------------+ BatchNorm is not mentionned because batch normalization parameters are automatically fused with convolutions parameters with the command “-fuse”.