That agrees with what I've read elsewhere on this forum (problems with SPLIT transactions, and also ISOCHRONOUS endpoints).
I found that in debug mode I don't have any clock drift. Deep investigation shown that small delay (2-10 usec) at end of dwc_otg_hc_start_transfer fix the problem in non debug mode.
I did patch https://gist.github.com/3762318
that fix the problem (no more than 48 samples drift with iperf on background). To enable it just apply the patch and put dwc_otg.split_delay_dynamic=10 or dwc_otg.split_delay_static=10 to cmdline.txt. The difference between split_delay_dynamic and split_delay_static that split_delay_static always wait specified amount of time but split_delay_dynamic test registers and exit if split transaction finished quickly than specified amount of time.
In worst case this patch take 50ms(5%) per second in delays because every millisecond we have 5 (2 on playback:start+result and 3 on capture:start+continue to get data+continue to get data with NYET result) delays on 10usec. But with dynamic delay I got performance downgrade no more than 1-2%
But I still don't understand why it actually work and only USB analyzer can help. I would be very appreciated if somebody with USB analyzer run the tests and capture USB traffic on PHY level.
To reproduce the problem you need:
1. USB audio device (like VR-Fidelity http://www.ebay.com/itm/Portable-USB-Sp ... 519ca2f0b1
2. auddemo test application: http://call-o-call.com/auddemo.zip
3. alsa config patch to enable plughw : https://gist.github.com/3762409
4. kernel patch to enable delays https://gist.github.com/3762318
Just run auddemo application and enter "d 10 10" then "t X X 16000 10 1" where X X is number of your USB audio device in list (usually 1 if you have broadcom audio enabled). For more impressive results just run iperf in background. Then enable the patch via "echo 10 >/sys/module/dwc_otg/parameters/split_delay_dynamic" and test again.