Generally, input scale matters. Changing to grayscale matters for sure. Details depend on the training data. That is, if the training data contains the object with the same scale you use, it might not make a big difference, if not it makes a difference. Deep learning is mostly not invariant to any changes in the data. CNNs show some invariance to translation, but that is about it. Rotation, scaling, color distortion, brightness etc. all impact performance negatively - if these conditions have not been part of the training.
The paper https://arxiv.org/abs/2106.06057 published at IJCNN 2022 investigates a classifier on rotated and scaled images on simple datasets like MNIST (digits) and show that performance deteriorates a lot. There are also other papers that showed the same thing.