Regarding the naming, it is a bit of a misnomer. The 'args' variant uses attributes to express min/max and is therefore only valid for fixed ranges. The 'vars' variants take arbitrary tensors for min/max. Whether these are actual vars or some other computed value depends on your quantization approach. The 'vars' variants have gradients for their min/max and can therefore be trained. A lot of training approaches just compute them at training time using min/max of the batch and then accumulate these into non trainable vars using an exponential moving average. Then at eval time, the min/max vars are used in place of the computed min/max.
If adding them manually, you need to make sure that the inputs to all arithmetic ops (add, mul, etc but not transpose, reshape, etc) have an appropriate fake_quant* op on the tensors that feed in to it.
In practice, the rule I've found that works for this is:
when a weight var feeds into an arithmetic op, add a fake_quant_with_min_max_vars that computed its min/max from the min/max of the weight.
add a fake_quant_with_min_max_vars after any arithmetic op that accumulates into dedicated min/max vars for each op at training time and just uses the vars at eval time.
add an appropriate fake_quant* op to the very top level inputs to your model (not necessary if it is a model that is driven via some form of embedding lookup). This includes constants coming in unless if they are the default range.
If you do it in this way, you'll generally be in a situation where every tensor is quantized without redundant/conflicting quant params. Depending on the model, there can be additional nuance and other tricks needed to actually get toco/tflite to be able to run it with only quantized types.
I'm less familiar with the automated tools that do this, but I believe this is the general approach they take when rewriting the graph. They also have some significant complexity to detect and work around certain patterns that need extra massaging when trying to do a transformation in the blind at the graphdef level (as opposed to the source level where some things are more obvious).
For the "manual" approach to not be too burdensome, I've written/used libraries that just let me annotate the important tensors by passing them through helper functions that defer to a model level set of parameters that let me tune the quantization strategy layer by layer.
Hth.