As far as I understood from the documentation, zoom is composed of (x, y, w, h) Let's say we have an image of w, h = (3280, 2464) . If I use a zoom = (0.25, 0.25, 0.5, 0.5) I expect to capture images where the center part matches with the center of the full image (like a x2 zoom). And the result is ...