Using -march= wouldn't set the optimisation level, that would be wrong. -march= is equivalent to doing both a -mcpu= and a -mtune= with their corresponding values for the architecture.
The version of the compiler (6.3.0) in Raspbian Stretch is the updated version that fixed the bug in the native detection but the code hasn't had the ARMv8 cpus added to its list of known variants so it will default back to using what the compiler was set up to target by default when it sees that the cpu is a Cortex-A53 (or any other that the native detection code doesn't know about). The compiler can happily be set to target the Cortex-A53, it's just the routine that re-writes the option from native to whatever value it should be that is lacking knowledge of ARMv8 cpus.
You will need to specify the architecture (and/or cpu / tune) yourself when compiling on the RPi3 (and the RPi2 v1.2 which uses the same SoC).
She who travels light — forgot something.