Is 1.0 implicit or actually stored?
That's implementation specific. If you were asking about 888 vs 8888 textures, I'd tell you that pretty much every implementation is bound to use 32 bits per texel, but I'm not so sure for 16F formats. It is telling that Metal doesn't define an RGB16F format (link) which strongly suggests that PowerVR GPUs at least will pad the format. Vulkan does define RGB16F, but while the spec requires support for R16F, RG16F and RGBA16F it doesn't require support for RGB16F (link), again suggesting lack of native support by some vendors. I wouldn't be surprised if some GPU somewhere does support RGB16F, but I suspect most would just pad. For a more definitive answer you might need to post questions on the GPU forums or experiment by examining memory usage in some controlled conditions.
And in the latter case. My main question: If i put my 16bit heightmap into the alpha channel, so it becomes RGBA16F. Will I improve performance?
Are you sampling it at the same time (i.e. from the same shader, with the same UVs)? If so, then yes absolutely it will be a better choice than using an RGB16F plus a R16F. If they're not sampled together (e.g. the heightmap is sampled in the vertex shader, the colour in the fragment shader), then it's harder to guess. Probably you'd be harming performance on the heightmap fetch (those extra bytes blowing the cache), but leaving the colour fetch unharmed (there was padding there anyway) - overall you'd lose some performance but save some memory - any performance loss is probably pretty minor and if your bottleneck lies elsewhere it may not do any harm at all.