i have to answer two questions:
- How many is there users of SQL?
- How many of the users are using MySQL only
The hard part of this is that any respodent could had chosen many options, so we can have Respondent number 4 uses both MySQL and SQLite, but for the first question he should be counted only once, so standardgroupby().count
can't do the thing. Also, some user may had chosen MySQL and some other database, so he cannot be counted to 2nd question. What should i do? i tried many solutions but they all led me to nothing
i came up with this
import re
query = 'SELECT * FROM DatabaseWorkedWith'
df = pd.read_sql_query(query, conn)
pass
inde_list = list()
for index in df.index:
if re.search('SQL{1}', df.loc[index, 'DatabaseWorkedWith']):
respondent = df.loc[index, 'Respondent']
if respondent not in inde_list:
inde_list.append(respondent)
else:
df.drop(index, inplace=True)
del inde_list
df
(for some reason i cannot prettify the format of this code) but there must be a better way and this still deals with only half a problem