It was a little difficult to me to understand your question, maybe you can be more direct?
Anyways, here are some insights on SIFT:
The scale should be taken into account in the feature extraction that is performed
in the neighbourhood. Usually, this is done by setting a Gaussian function around
the keypoint with a variance proportional to the scale in which the
point was detected. This function serves as weights for the estimation of the
histograms of gradients.
Also, when you refer to keypoint orientation I think what you are referring to
is to the most usual orientation in the neighbourhood of that point. This is computed
by looking for the largest bin in the histogram and this orientation
is stored in order to have rotation invariance for the points.
I hope that helps, cheers.