1

I am trying to create a stacked area chart, showing the evolution of courses and their numbers over time. So my data frame is (index=Year):

                    Area  Courses
Year                             
1900         Agriculture      0.0
1900        Architecture     32.0
1900           Astronomy     10.0
1900             Biology     20.0
1900           Chemistry     25.0
1900   Civil Engineering     21.0
1900           Education     14.0
1900  Engineering Design     10.0
1900             English     30.0
1900           Geography      1.0

Last year: 2011.

I tried several solutions, such as df.plot.area(), df.plot.area(x='Years'). Then I thought it would help to have the Areas as columns so I tried

df.pivot_table(index = 'Year', columns = 'Area', values = 'Courses', aggfunc = 'sum')

but instead of getting sum of courses per year, I got:

Area  Aeronautical Engineering  ...  Visual Design
Year                            ...               
1900                       NaN  ...            NaN
1901                       NaN  ...            NaN

Thanks for your help. It's my first post. Sorry if I missed something.

Update. Here is my code:

df = pd.read_csv(filepath, encoding= 'unicode_escape')
df = df.groupby(['Year','GenArea'])['Taught'].sum().to_frame(name = 'Courses').reset_index()
plt.stackplot(df['Year'], df['Courses'], labels = df['GenArea'])
plt.legend(loc='upper left')
plt.show()

And here is the link for the dataset: https://data.world/makeovermonday/2020w12

Sorin Ir
  • 11
  • 3

1 Answers1

1

With the extra given information I made this. Hope you like it!

import pandas as pd
import matplotlib.pyplot as plt

plt.close('all')

df=pd.read_csv('https://query.data.world/s/djx5mi7dociacx7smdk45pfmwp3vjo',
               encoding='unicode_escape')
df=df.groupby(['Year','GenArea'])['Taught'].sum().to_frame(name=
             'Courses').reset_index()
aux1=df.duplicated(subset='GenArea', keep='first').values
aux2=df.duplicated(subset='Year', keep='first').values

n=len(aux1);year=[];courses=[]

for i in range(n):
    if not aux1[i]:
        courses.append(df.iloc[i]['GenArea'])
    if not aux2[i]:
        year.append(df.iloc[i]['Year'])
    else:
        continue

del aux1,aux2
df1=pd.DataFrame(index=year)
s=0

for i in range(len(courses)):
    df1[courses[i]]=0
for i in range(n):
    string=df.iloc[i]['GenArea']
    if any(df1.iloc[s].values==0):
        df1.at[year[s],string]=df.iloc[i]['Courses']
    else:
        s+=1
        df1.at[year[s],string]=df.iloc[i]['Courses']

del year,courses,df
df1=df1[df1.columns[::-1]]
df1.plot.area(legend='reverse')

Example

  • I read that article and tried this code `plt.stackplot(courses_by_area['Year'], courses_by_area['Courses'], labels = courses_by_area['Area'])` but still doesn't work. – Sorin Ir Mar 25 '20 at 09:44
  • Could you add an example of your code? I don't understand what you want or what you are trying to do. – Sebastián V. Romero Mar 25 '20 at 11:16
  • I already posted all my code. I have that dataset with areas of studies for each year and I want to plot how many courses where taught in each year and their split across the areas. I hope it's more clear now. Thanks! – Sorin Ir Mar 25 '20 at 19:18
  • When I asked for a piece of code, I asked for a piece like I shared with all your data to plot. And you're welcome! – Sebastián V. Romero Mar 26 '20 at 14:40
  • I updated my post with link to dataset and my code. – Sorin Ir Mar 26 '20 at 17:41
  • And I want to get something like this: https://public.tableau.com/profile/jeb8711#!/vizhome/Week13-CaliforniaUniversitycourseoffeirng/CourseOffering – Sorin Ir Mar 26 '20 at 18:11
  • 1
    wow, Thanks. I was expecting something like this. I go practice till I fully understand it. Best! – Sorin Ir Mar 28 '20 at 07:35
  • Sounds great! Feel free to ask me what you want :) `@SorinIr` – Sebastián V. Romero Mar 28 '20 at 11:38