I thought about making a GPU-accelerated version, but never did--maybe over Christmas vacation.jcyr wrote: ↑Tue Oct 08, 2019 4:55 pmThe algorithms I've developed in CUDA are cryptographic using integer math. Does the test you suggest offload computation to the CUDA cores? I'm not that reliant on the performance of the ARM cores.ejolson wrote: ↑Tue Oct 08, 2019 4:00 pmIt's worth noting that the GM20B Maxwell GPU in the Nano is primarily designed for machine-learning workloads. In particular, the peak floating-point performance iswhich makes the Nano's GPU slower at double-precision than the Cortex-A72 CPUs on the Pi 4B. Of course the Nano also has some ARM CPUs. I wonder how close they are in speed?
Code: Select all
GPU FP16 FP32 FP64 Jetson Nano 472 236 7.4 GFLOPS
If you are able, I would be very interested to compare the relative performance of the ARM CPUs on the Nvidia Jetson Nano to the Raspberry Pi 4B by running this Pi pie chart program on the Nano after it arrives.
Currently the Pi pie chart programs are OpenMP only and do not offload anything to the GPU. Thus, the resulting pie chart would only compare the quad-core Cortex-A57 on the Nano to the Quad-Core Cortex-A72 on the 4B. Although, the main point of the Nano is having a CUDA-enabled GPU, comparing the ARM cores is still a little bit interesting.
I don't have much experience using CUDA for integer arithmetic. How much faster do your CUDA-accelerated encryption routines perform compared to equivalent CPU versions?