Would be interesting to see if the Pi has the processing power to pull it all off (700Mhz), I have been doing alot of work on Kinect interfaces recently and even though the majority of depth sensing is done in hardware you will still need something with some number crunching power to do something with the data in a timely manner. It also depends on which objects it is going to recognise, such as distinguishing faces with depth data is a different kettle of fish to say, distinguish between 2d images such as a blue square or a green circle using just the Kinect camera.
Would you be looking at just using the Kinect camera for recognition? or would you also be considering using the depth sensor and IR Camera?
if its just the normal Kinect camera(not the IR) why not consider using a normal web cam? it might make things a bit easier+ it will most likely have a better resolution + an open source video editor called VLC is also capable of detecting motion etc, perhaps you could get the source and use that with the visual side of things?
A good place to start on Linux with Kinect (gesture and depth sensinng) is NITE and OpenNI and possibly Wiring or Blender for visualizing the data – it might be worth looking up their minimum requirements hardware wise as well but its likely it might be a bit too much to ask of the ARM processor to do much in real-time!
Please post back if you have any luck with it! It would be interesting to see if there are ways to use the ARM GPU for extra processing resources as well!
Heres a link to some people trying to run kinect on a 1.2Ghz ARM Processor and how they are getting on: