1

I'm working on a Pandas DF question and I am having trouble converting some Pandas data into a usable format to create a Scatter Plot.

Here is the code below, please let me know what I am doing wrong and how I can correct it going forward. Honest criticism is needed as I am a beginner.

# Import Data
df = pd.read_csv(filepath + 'BaltimoreData.csv')

df = df.dropna()
print(df.head(20))
# These are two categories within the data
df.plot(df['Bachelors degree'], df['Median Income'])

# Plotting the Data
df.plot(kind = 'scatter', x = 'Bachelor degree', y = 'Median Income')
df.plot(kind = 'density')
Brandon
  • 25
  • 1
  • 5
  • 3
    Forget the code, where's your data? Please print(df.head(20)) and post its output here. – cs95 Oct 22 '17 at 23:11
  • I added the heading so you can see the first 20 lines of data. – Brandon Oct 23 '17 at 22:58
  • Unfortunately, I don't have access to your computer, so I cannot load your data from your filepath. While it seems your issue was resolved this time, please look at how to provide a [mcve] in the future which helps us give you better answers. – cs95 Oct 23 '17 at 22:59

2 Answers2

2

Simply plot x on y as below, where df is your dataframe and x and y are your dependent and independent variables:

import matplotlib.pyplot as plt
import pandas

plt.scatter(x=df['Bachelors degree'], y=df['Median Income'])
plt.show()
Johnnyh101
  • 1,305
  • 1
  • 8
  • 11
  • When I run that I get the following error message: could not convert string to float: '$37,678 ' – Brandon Oct 23 '17 at 22:49
  • Well you've got Median Income formatted as a string - read_csv is detecting the dollar sign and assuming you're working with strings (i.e. text). You could simply change it to be formatted as a number in your CSV. – Johnnyh101 Oct 24 '17 at 15:21
0

You can use scatter plot from pandas.

import pandas
import matplotlib.pyplot as plt
plt.style.use('ggplot')
df.plot.scatter(x='Bachelors degree', y='Median Income');
plt.show()
Joe T. Boka
  • 6,554
  • 6
  • 29
  • 48
  • So I made some adjustments to the code so it looks like this: df.dropna(axis = 0, how = 'any') plt.style.use('ggplot') df.plot.scatter(x = df['Bachelors degree'], y = df['Median Income']) plt.show() However it it still throwing me the error that it cannot index with vector containing NA/NaN values. – Brandon Oct 23 '17 at 22:57