The Pi GPIO has a slow AXI bus it can't get anywhere near 250Mhz which you might hope for based on the core CPU speed and GPIO datasheet.
Maximum high/low clock speeds on the CPU access to the GPIO are
DMA transfers to the GPIO are significantly slower than CPU access
It's not documented anywhere most of us setup simple loops with an oscilloscope to test it.
As MarkR commented above the PWM's are probably the fastest possibly approaching the datasheet values as they don't involve the slow axi bus.
So no there is simply no way to get near 250Mhz you hope for. You can basically treat the whole peripheral memory area like it has a big wait state inserted into it.
I would have thought the hardware UARTS on the Pi could be setup to deal with that speed or is the UART data non standard?
If it is non standard and you want a more practical suggestion, what you are trying to do is around a hundred lines of VHDL code which will be everywhere on the internet as it's a sort of VHDL tutorial example always done. It will easily fit into even the smallest FPGA and if you choose a 3.3V FPGA part then you can directly connect it to the Pi.