About number of instructions on ARM: not 100



About which version to learn: I'd suggest ARMv8, AArch64. Simply because by the time you'll master it, older versions will become long long time obsolete, and maybe not supported by newer hardware at all.
About inlined assembly in gcc: sure, there are plenty, see for example __builtin_memcpy() or __builtin_va_args(), or any other function that starts with "__builtin_".
About realistically implement in asm: I'd say small, low level hardware dependent or performance critical libraries. Compilers are very good at micro-optimisations (which registers are used, how to eliminate branches, optimize pre-fetch etc.), but they have absolutely no clue how to do macro-optimisations (algorithmic (asymptotic analysis) or procedural level (when to use procedures to generate data or use look-up tables instead for example)). Portability and performance are two, mutually exclusive goals (because architectures differ). In the old days CPUs were much slower, therefore performance was more important, meaning more assembly; nowadays portability seems to be the more important goal, therefore you only use assembly for things that cannot be expressed in higher languages (like "mrs" or "eret"), or very performance critical tasks. Assembly is also popular among embedded developers, where the code size is an issue, and the purpose of the program is specific and well defined. Although the shift towards higher languages can be spotted there too (see Arduino microcontroller's C like language for example).
All of the above does not stop people from using Assembly for crazy things of course

Cheers,
bzt