The latest additions are MultiThreading Benchmarks
, essentially the same as my Android progams described, along with results, in:
http://www.roylongbottom.org.uk/android ... hmarks.htm
All run the benchmarks using 1, 2, 4 and 8 threads. Those that use caches and RAM have data sizes around 12.8 KB, 128 KB and 12.8 MB. Further details and results can be found in
http://www.roylongbottom.org.uk/Raspber ... hmarks.htm
These will be of more use if a Raspberry Pi appears with more than one CPU core. The benchmarks and source codes are available in:
http://www.roylongbottom.org.uk/Raspber ... hmarks.zip
The latter also includes identical code compiled for Intel compatible processors running under Linux.
- measures floating point speed on data from caches and RAM. The first calculations are as used in MemSpeed. Others use more calculations on each data word. Each thread carries out the same calculations but accesses different segments of the data. The result, on cache based calculations, is often performance proportional to the number of cores used.
- Multiple threads each run the eight test functions at the same time, but with some dedicated variables. Measured speed is based on the last thread to finish, with Mutex functions, used to avoid the updating conflict by only allowing one thread at a time to access common data. Again performance is generally proportional to the number of cores used. There can be some significant differences from the single CPU Whetstone benchmark results on particular tests due to a different compiler being used.
- This runs multiple copies of the whole program at the same time. Dedicated data arrays are used for each thread but there are numerous other variables that are shared. The latter reduces performance gains via multiple threads and, in some cases, these can be slower than using a single thread.
- This runs integer read only tests using caches and RAM, each thread accessing the same data sequentially. To start with, data is read with large address increments to demonstrate burst data transfers. Performance gains, using L1 cache, can be proportional to the number of cores, but not quite so using L2. The program is designed to produce maximum throughput over buses and demonstrates the fastest RAM speeds using multiple cores.
- The benchmark has cache and RAM read only and read/write tests using sequential and random access, each thread accessing the same data but starting at different points. It uses the Mutex functions as in Whetstone above, sometimes leading to no performance gains using multiple threads. Random access is also demonstrated as being relatively slow where burst data transfers are involved.