Actually, you could design a system around just the Arm datasheet, if you just want to use the Arm (and basic framebuffer). However, the OP's task would require the GPU for real horsepower, which, without openCL or similar, is not accessible for general purpose compute.
We have discussed this sort of thing at work. I reckon the low power requirement of the VC4 GPU and its high compute power would make it a good subject for this sort of supercomputing work but for one issue - Comms - you really need dedicated comms channels between devices for a supercomputer, which the chip doesn't have.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
“I own the world’s worst thesaurus. Not only is it awful, it’s awful."