1

I am making a graph for a life expectancy data set, and it is not going well, I want to add the deaths and say from which region and year it is. I have tried multiple stuff but it either comes out cramped to the point where the text is overlapping, or it has errors.

I tried:

import pandas as pd
import matplotlib.pyplot as plt


life_expectancy_dataset=pd.read_csv("/kaggle/input/life-expectancy-who-updated/Life-Expectancy-Data-Updated.csv")

life_expectancy_dataset['Region'].value_counts()

plt.plot(life_expectancy_dataset["Year"])
plt.plot(life_expectancy_dataset["Region"])
plt.plot(life_expectancy_dataset["Life_expectancy"])
plt.title("Life Expectancy")
plt.ylabel("Life Expectancy")

plt.show()

and the graph wasn't right and there was overlapping text at the bottom right.

What
  • 13
  • 2
  • The correct way to do this with pandas is `pt = df.pivot_table(index='Year', columns='Region', values='Life_expectancy')`, `ax = pt.plot(kind='bar', figsize=(12, 8), rot=0, width=0.9)` and `ax.legend(bbox_to_anchor=(1, 0.5), loc='upper left', frameon=False)` just 3 lines of code. [code and plot](https://i.stack.imgur.com/IL50k.png) – Trenton McKinney May 13 '23 at 17:52

1 Answers1

1

To have a better graph, you should try to visualize life expectancy over time across different regions with a line plot for each region.

For example:

import pandas as pd
import matplotlib.pyplot as plt

life_expectancy_dataset = pd.read_csv("/kaggle/input/life-expectancy-who-updated/Life-Expectancy-Data-Updated.csv")

regions = life_expectancy_dataset['Region'].unique()

for region in regions:
    region_data = life_expectancy_dataset[life_expectancy_dataset['Region'] == region]
    region_data = region_data.sort_values(by='Year')
    plt.plot(region_data['Year'], region_data['Life_expectancy'], label=region)

plt.xlabel('Year')
plt.ylabel('Life Expectancy')
plt.title('Life Expectancy Over Time by Region')
plt.legend()
plt.show()
Saxtheowl
  • 4,136
  • 5
  • 23
  • 32