I have been working on a project that requires sound generation after a button press with as low a latency as possible, and have had good success using fluidsynth and pyfluidsynth even with over 100 sounds (there are lots of buttons!) being triggered simultaneously or staggered but overlapping. I use 2 periods with a period size of 64, and disable the internal reverb and chorus effects.
I have a delay of 60ms from triggering the button to hearing the sound, of which about 80% is fluidsynth. This is not quite real time, but close enough for me. Different peoples' perceptions may vary, but I found through experimentation that I notice a delay if it is at least 100ms. I would like to get it below 50ms, but beyond that does not matter for my purposes.
Fluidsynth does start to sound appalling if I try over about 170 simultaneous sounds, as it does not have time to calculate the buffer contents before the start of the next period.
I found that using multiple cores on an RPi v3 did not help, probably due to the overhead of setting up the parallelisation, but giving fluidsynth a dedicated single core did. I added isolcpus=3 to /boot/cmdline.txt and started fluidsynth through taskset -c 3. The first stops processes running on core 3 unless explicitly put there, and taskset puts it there.
I have also found that the soundfont can make a big difference. Soundfonts work by combining one or more samples and possibly applying some processing to each or the combined total. Using a soundfont that just use a single sample and no processing for each instrument, so that processing is minimal, helps enormously.