If it was me, I'd feed it to the input of the H.264 encoder, rather than the image encoder. I believe that if you set the input port characteristics correctly it'll take a stream of images -- it'll happily eat the output of the resize and image_fx components; see omxtx in this forum for details -- and you'll end up with a nice video stream instead of MJPEG. I'd then drop the timecode into the metadata (MPEG TS TDT / TOT given the choice, but that's not going to be to everyone's taste), rather than in-vision.
I'm also underwhelmed at the utility of libilclient. I had more luck ignoring it and using the OpenMAX spec. and interfaces raw. OpenMAX isn't pretty -- my list of complaints with it can probably be found quite easily -- but I have to admit it does mostly work.
Just a couple of thoughts.