Seems the Pi really is a bit faster than the Jetson at this.
I have cleaned up those warning messages and split the different versions of convolution into separate files/modules here:
Even though the Nano costs a lot more, it is not so surprising the Cortex A-72 cores in the Pi running at 1.5 GHz are generally faster than the Cortex A-57 cores in the Jetson Nano running at 1.4 GHz. Of course the Nano has a GPU that runs CUDA. Unfortunately, the 2014 Maxwell design was replaced by Pascal in 2016, by Volta in 2017 and now Ampere in 2020.
Seems we can load and run CUDA stuff from Rust: https://bheisler.github.io/RustaCUDA/ru ... index.html