I have figured out the problem. It is the problem when that part of the model is not really trained QAT. This happens for the output node that somehow forgets to QAT when training. The -6 and 6 values come from the default source of the quantization of tf1.x as mention here
To overcome the problem, we should provide some op to trigger the QAT for the output nodes. In my regression case, I add a dummy op: tf.maximum(output,0) in the model to make the node QAT. If your output is strictly between 0-1, applying "sigmoid" activation at output instead of relu can also solve the problems.
