I am trying to use category_encoders.TargetEncoder to encode a categorical feature. My target variable is a continuous number. However, the output from the target encoder is very strange and I could not interpret it. Could someone give me a hint on what is happening?
Here is my toy code.
from category_encoders import TargetEncoder
df = pd.DataFrame(['A', 'B', 'C', 'D', 'E', 'F', 'F', 'F', 'G', 'G', 'G'], columns=['cat'])
df['target'] = [921, 921, 3.5, 280, 0, 3.5, 3.5, 3.5, 200, 200, 200]
now df looks like
cat target
0 A 921.0
1 B 921.0
2 C 3.5
3 D 280.0
4 E 0.0
5 F 3.5
6 F 3.5
7 F 3.5
8 G 200.0
9 G 200.0
10 G 200.0
Then I ran the encoder as:
encoder = TargetEncoder()
df['encoded'] = encoder.fit_transform(df["cat"], df['target'])
any here is my output
cat target encoded
0 A 921.0 248.727273
1 B 921.0 248.727273
2 C 3.5 248.727273
3 D 280.0 248.727273
4 E 0.0 248.727273
5 F 3.5 32.731807
6 F 3.5 32.731807
7 F 3.5 32.731807
8 G 200.0 205.808433
9 G 200.0 205.808433
10 G 200.0 205.808433
What I don't understand is that, for categories with 1 value in it, (e.g., category 'A' to 'E'), the encoder doesn't seem to differentiate the target value differences. Is that by design?