So I have this assignment for a pandas course and I cannot wrap my head around how one would go about doing this correctly. The assignment gives me this huge data file with all kinds of columns displaying data, it looks like this:
Document Year Parties Question Ministry
x1021.xml 1995 D66 What does... Ministry of Safety
x1022.xml 1995 CDA When do we... Ministry of Culture
x1023.xml 1995 PvdA When can we... Ministry of Agriculture
And this goes on and on for thousands of rows. The first excercise tells me to make a crosstab of the columns: Years and Parties with Years being the index and Parties being the columns. This is not hard at all and only requires one line of code:
pd.crosstab(index=df['Year'], columns=df['Parties'])
But the next question throws me of: Now using the original dataframe, make a new dataframe with years being the index and only the top 10 Parties which asked the most questions being the columns using the crosstab function.
I understand I first have to sort the dataframe before I can use the crosstab but if I use anything else then just the most basic crosstab function it gives me errors. I also understand that the top 10 parties should be unique so I guessed that at some point I had to use the unique function but that only returns an array in which I cannot connect the Years to them anymore. Skipping forward 4 hours in time: I now have tried using the groupby function, the sort_values function and the unique function but it seems like I can't get them to work properly. So in order to keep myself sane, I am asking you guys to please help me getting this to work or at least explain why I can't get it to work.