We use the
EDID to determine what modes a display supports and prefers.
There are two broad categories of display modes, CEA (TV style resolutions) and DMT (monitor style resolutions).
CEA modes should be driven with HDMI signalling which supports audio.
DMT modes should be driven with DVI signalling which doesn't support audio.
Now you can drive DMT modes with HDMI signalling which allows some monitors to support audio,
but according to the spec they don't have to support this, and may just report unsupported mode and give no picture.
So, to be safe we follow the specs and don't support audio for DMT modes. You have to enable this manually (and disable it if it causes problems).
Now, I think monitors rejecting HDMI signalling may be rare (I've not seen one), but enabling it by default may well cause issues for some users and would be a risk.
Choosing the option that gives video and no audio is safer than an option that may give no video.
Note: I imagine most Pi users are using standard HDMI TVs using CEA modes and so don't have any problems.
Many computer monitors (probably most) don't support audio, so there isn't an issue.
However if you have a computer monitor that supports audio then you do fall into the category where a manual change is required.