otacky_taka
Posts: 2
Joined: Thu Apr 13, 2017 8:08 pm

Linalg of Python3 Numpy is so Slow on RasPi 4

Tue Oct 15, 2019 8:31 am

I've been developing an audio signal processing system on RasPi. The system is based on pyaudio, numpy, scipy, librosa, etc., so the stability and speed of numpy are a critical factor, and so far, it was almost always OK. This month, we need to have a USB-OTG port and decided to move on to Raspi4.

1. Started with 2019-09-26-raspbian-buster.zip
2. Working on pyvenv, built with native Python-3.7.3
3. The building of the numpy module was rather straightforward, except for the gfortran issue.
4. Found that linalg submodule is so slow, about 20 to 50 times slower than the speed of RasPi 3B+
5. I suspected inappropriate libraries (OpenBLAS vs. ATLAS, for example), and changed some of them, but this did not make any differences.
6. Even tried to install numpy with 'pip install --no-binary :all: numpy,' but the results were the same; resultant np.linalg was so slow.
7. To RasPi 4, I inserted one of the microSD cards which work properly on RasPi 3B+ and earlier boards, but the numpy remains so slow.

I'll show some codes to clarify what I have done so far:

* The Numpy on the old card works fine on RasPi 3B+

Code: Select all

(pve37) fukuda@raspi23:~/pve37% ipython
Python 3.7.3 (default, Apr  3 2019, 05:39:12) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np                                                            

In [2]: A = np.random.rand(256, 256)                                                  

In [3]: %timeit B = np.linalg.inv(A)                                                  
28.9 ms ± 325 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [4]: %timeit C = np.fft.fft2(A)                                                    
21.8 ms ± 705 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [4]: %timeit C = np.fft.fft2(A)                                                    
21.8 ms ± 705 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: np.show_config()                                                              
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
  NOT AVAILABLE
atlas_3_10_blas_threads_info:
  NOT AVAILABLE
atlas_3_10_blas_info:
  NOT AVAILABLE
atlas_blas_threads_info:
  NOT AVAILABLE
atlas_blas_info:
    language = c
    define_macros = [('HAVE_CBLAS', None), ('NO_ATLAS_INFO', -1)]
    libraries = ['f77blas', 'cblas', 'atlas', 'f77blas', 'cblas']
    library_dirs = ['/usr/lib/arm-linux-gnueabihf']
blas_opt_info:
    language = c
    define_macros = [('HAVE_CBLAS', None), ('NO_ATLAS_INFO', -1)]
    libraries = ['f77blas', 'cblas', 'atlas', 'f77blas', 'cblas']
    library_dirs = ['/usr/lib/arm-linux-gnueabihf']
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
  NOT AVAILABLE
openblas_clapack_info:
  NOT AVAILABLE
flame_info:
  NOT AVAILABLE
atlas_3_10_threads_info:
  NOT AVAILABLE
atlas_3_10_info:
  NOT AVAILABLE
atlas_threads_info:
  NOT AVAILABLE
atlas_info:
    language = f77
    libraries = ['lapack', 'f77blas', 'cblas', 'atlas', 'f77blas', 'cblas']
    library_dirs = ['/usr/lib/arm-linux-gnueabihf']
    define_macros = [('NO_ATLAS_INFO', -1)]
lapack_opt_info:
    language = f77
    libraries = ['lapack', 'f77blas', 'cblas', 'atlas', 'f77blas', 'cblas']
    library_dirs = ['/usr/lib/arm-linux-gnueabihf']
    define_macros = [('NO_ATLAS_INFO', -1)]

* but, the same module is very slow on RasPi 4B

Code: Select all

In [4]: A = np.random.rand(256, 256)                                                  

In [5]: %timeit B = np.linalg.inv(A)                                                  
1.3 s ± 225 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [6]: %timeit C = np.fft.fft2(A)                                                    
13 ms ± 11.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
These are almost the case for Numpy built on RasPi 4B, as stated in 4., 5., and 6. above.

It looks like that the RasPi3 and RasPi4 react to the same binary in a significantly different way, and I have now no idea which way to look. Any advises or suggestions would be appreciated.

otacky_taka
Posts: 2
Joined: Thu Apr 13, 2017 8:08 pm

Re: Linalg of Python3 Numpy is so Slow on RasPi 4

Fri Oct 18, 2019 8:25 pm

I think I got a temporary solution ...

In short, the issue was:

Code: Select all

(pve37) fukuda@raspi25:~% ipython                             
Python 3.7.3 (default, Apr  3 2019, 05:39:12) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np                                                            

In [2]: A = np.random.rand(256, 256)                                                  

In [3]: %timeit B = np.linalg.inv(A)                                                  
1.04 s ± 130 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
After all my efforts, this issue remained, sometimes it got worse. However, after having given up OpenBLAS:

Code: Select all

(pve37) fukuda@raspi25:~% sudo apt remove libopenblas-dev libopenblas-base
the issue has been improved drastically.

Code: Select all

In [1]: import numpy as np                                                            

In [2]: A = np.random.rand(256, 256)                                                  

In [3]: %timeit B = np.linalg.inv(A)                                                  
32 ms ± 36.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
The result would be fast enough for current purposes, though OpenBLAS would halve the time if it worked adequately.

mallets
Posts: 4
Joined: Thu Sep 19, 2019 4:00 am

Re: Linalg of Python3 Numpy is so Slow on RasPi 4

Mon Dec 16, 2019 12:38 am

On Pi 4, unofficial Ubuntu 18.04 64 bit (Python 3.8, openblas-base installed, numpy pip installed from source):

Code: Select all

linalg.inv: 15.465410000160773 ms
fft.fft:    8.76734899998155 ms
np.show_config() output:

Code: Select all

blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/lib/aarch64-linux-gnu']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/lib/aarch64-linux-gnu']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/lib/aarch64-linux-gnu']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/lib/aarch64-linux-gnu']
    language = c
    define_macros = [('HAVE_CBLAS', None)]

Return to “Beginners”