Seems the Pi really is a bit faster than the Jetson at this.
I have cleaned up those warning messages and split the different versions of convolution into separate files/modules here:
Even though the Nano costs a lot more, it is not so surprising the Cortex A-72 cores in the Pi running at 1.5 GHz are generally faster than the Cortex A-57 cores in the Jetson Nano running at 1.4 GHz. Of course the Nano has a GPU that runs CUDA. Unfortunately, the 2014 Maxwell design was replaced by Pascal in 2016, by Volta in 2017 and now Ampere in 2020.
While the Nano is the only inexpensive single-board heterogeneous system architecture, just like the coronavirus, maybe one day GPU computing will suddenly end.
Is it possible to program an NVIDIA GPU using a Rust-like language?
Seems we can load and run CUDA stuff from Rust: https://bheisler.github.io/RustaCUDA/ru ... index.html
Memory in C++ is a leaky abstraction .
And this seems appropriate time to mention/ask a question of getting openCL for similar for Pi's GPU... I have some doubts it would be able to compete with nVidia's GPU, but it would be nice to have some feel how close they are...