using the the same code block for all 4 cores & it worked & is a massive speed up:
If you change the resolution code to:
Code: Select all
; Setup Frame Buffer
SCREEN_X = 1920
SCREEN_Y = 1080
All cores are rendering 4 pixels at a time in linear frame buffer memory, I found this was a great way to maximize calculation thruput.
I did not use any synchronization in this demo at all, so I was pleasantly surprised that the animation looks so nice & stable!
I will of course explore synchronization in the future, but I wanted a demo like this to show what you can do without it =D
Next up, I think I'll try to make a simple ray-tracer, optimized with NEON instructions, using all 4 cores.