I am analysing data from a survey. One of the questions is about what games do you like the most. This is a free-text field, so sometimes the users answers 1, 2, 3 items and sometimes nothing. As it is a free text, the strings can be the same but maybe the user introduced an extra space or an additional character, or the name is misspelled. How can I replace these values that are similar so "don't know", "dk", "don't now", "don't remember" are counted as the same string?
Here is a snippet of the data frame.
Q4_1 Q4_2 Q4_3 Q4_4 Q4_5 Q4_6 Q4_7 Q4_8 \
0 dark soul valkiring NaN NaN NaN NaN NaN NaN
1 Don't know NaN NaN NaN NaN NaN NaN NaN
2 World of Warcraft Fallout 3 Fallout 4 NaN NaN NaN NaN NaN
3 Don`t know NaN NaN NaN NaN NaN NaN NaN
4 warcraft NaN NaN NaN NaN NaN NaN NaN
5 don't know NaN NaN NaN NaN NaN NaN NaN
6 Mass Effect Series Skyrim Fallout 4 Tomb Raider NaN NaN NaN NaN
7 dark souls NaN NaN NaN NaN NaN NaN NaN
8 none NaN NaN NaN NaN NaN NaN NaN
9 candy cruss NaN NaN NaN NaN NaN NaN NaN
Q4_9 Q4_10
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 NaN NaN
print(df_survey_Q4.head(10).stack())
0 Q4_1 dark soul
Q4_2 valkiring
1 Q4_1 Don't know
2 Q4_1 World of Warcraft
Q4_2 Fallout 3
Q4_3 Fallout 4
3 Q4_1 Don`t know
4 Q4_1 warcraft
5 Q4_1 don't know
6 Q4_1 Mass Effect Series
Q4_2 Skyrim
Q4_3 Fallout 4
Q4_4 Tomb Raider
7 Q4_1 dark souls
8 Q4_1 none
9 Q4_1 candy cruss
dtype: object
print(df_survey_Q4.head(10).stack().value_counts())
Fallout 4 2
Skyrim 1
Fallout 3 1
World of Warcraft 1
valkiring 1
don't know 1
Tomb Raider 1
none 1
warcraft 1
dark souls 1
Mass Effect Series 1
dark soul 1
Don`t know 1
Don't know 1
candy cruss 1
dtype: int64
So in this snippet, I would like that Don't know, Don`t know and none are gathered together as a "Don't know" and it counts as 3, instead of everyone counting as 1.