how to find amount of users when one user could had chosen many options?

Question

Being given this table

i have to answer two questions:

How many is there users of SQL?
How many of the users are using MySQL only
The hard part of this is that any respodent could had chosen many options, so we can have Respondent number 4 uses both MySQL and SQLite, but for the first question he should be counted only once, so standard groupby().count can't do the thing. Also, some user may had chosen MySQL and some other database, so he cannot be counted to 2nd question. What should i do? i tried many solutions but they all led me to nothing

i came up with this

import re
query = 'SELECT * FROM DatabaseWorkedWith'
df = pd.read_sql_query(query, conn)
    pass
inde_list = list()
for index in df.index:
    if re.search('SQL{1}', df.loc[index, 'DatabaseWorkedWith']):
        respondent = df.loc[index, 'Respondent']
        if respondent not in inde_list:
            inde_list.append(respondent)
        else:
            df.drop(index, inplace=True)
del inde_list
df

(for some reason i cannot prettify the format of this code) but there must be a better way and this still deals with only half a problem

Consider editing your post. Item 2 of your question. – Andre Nevares Aug 15 '22 at 20:14 — Andre Nevares, Aug 15 '22 at 20:14

score 0 · Accepted Answer · answered Aug 15 '22 at 17:50

How I would approach the problem (there might be betters ways)

Since you want SQL users I assume it just means any user who has atleast chosen any one SQL variant. You can just use the contains function on the DatabaseWorkedWith column as follows and then drop all rows with false and also drop all duplicates in Respondent column to get all the unique users
Since you want users who use MySQL only you can rule out any user who has more than one row of answers. To do that all you need to do is compute number of occurences of respondents and add that as a new column. You can use this answer as a reference. The next step is to just drop rows based on this counts column where value is greater than 1. Finally you have only users with one answer so just filter out the values in DatabaseWorkedWith column based on where it matched MYSQL and the corresponding value in the respondents columns is your answer

how to find amount of users when one user could had chosen many options?

1 Answers1