I think it works similar to this(but only guessing):
The program consists of "components" which transport the image data via a buffer which is piped over "ports" between these components.
The basic component is the camera. The camera is capturing continuous images at some fixed frame rate(as requested with -fps) and resolution(only 2 resolutions are available (full frame 2592x1944 with max of 15fps and video frames at 1920x1080 at max of 30fps, all other sizes are scaled later).
The camera has three "output" ports, it can deliver the image to three other components.
In raspivid and raspistill these ports are connected to the "preview", "video" and "still" components which have "input" ports. The "still" component is a jpeg encoder, it takes the uncompressed image, compresses it as an jpeg image and outputs it to a callback function, which writes the jpeg data to a file. The "video" component is a h264 encoder, which will create a videostream of a series of images and again it is connected to a callback function which will just write the output of the encoder to a file. The "preview" port just "copy" the buffer to the monitor.
The difference between raspistill and raspivid is that raspistill only uses the jpeg encoder component and raspivid the h264encoder component.
The preview window usually displays only 1920x1080 at 30fps. The camera can only deliver one size of a frame at a time, and it needs to be changed to 1920x1080 for the preview. Now if the program requests a still image, it needs to change the resolution to the full sensor size, and this full frame is then send to the jpeg encoder. The change of the "mode" will take some time, and therefore an additional delay will occur. There is an option "--fullpreview" which will change the preview to display the full frame at 15 fps where no mode change should be necessary. And an option "--nopreview" where the buffer is just send to a null sink (just a component which does nothing with the buffer) and will not be shown. But in my tests I have not seen much difference between these preview settings.
The problem is the preview frames will always be generated, there are just not send in all cases. The image processing in the gpu will calculate exposure and iso using this continuous stream of images. And I think you can not "pause" this stream. And an image or video is just created, when the programm tells the camera component to deliver these frames to the "encoding" components.
Only JamesH and a few other will really know how it is working in the videocore gpu.
But for two full frame stills you will get always a delay of 67ms between frames. Therefore I would not be very optimistic for triggering full frame images.
I would also like to know if there is a difference between taking a picture from the preview port and taking the picture of the still port. There should be some, I guess.
Interesting link: http://inrng.com/2012/04/photo-finish-camera/
So the usual way is to take 10,000 frames per second video. Then you can really decide who has won a photo finish.
The idea is funny, to take a single column of a video frame and create an artificial image by putting these columns side by side to create a picture with time as the x-direction.