2

I am trying to plot a scatter plot for the given input using Pandas. I am getting the following error.

Input:

tweetcricscore  34 #afgvssco   51
tweetcricscore  23 #afgvszim   46
tweetcricscore  24 #banvsire   12
tweetcricscore  456 #banvsned  46
tweetcricscore  653 #canvsnk   1
tweetcricscore  789 #cricket   178
tweetcricscore  625 #engvswi   46
tweetcricscore  86 #hkvssco    23
tweetcricscore  3 #indvsban    1
tweetcricscore  87 #sausvsvic  8
tweetcricscore  98 #wt20       56

Code:

import numpy as np
import matplotlib.pyplot as plt
from pylab import*
import math
from matplotlib.ticker import LogLocator
import pandas as pd

df = pd.read_csv('input.csv', header = None)

df.columns = ['col1','col2','col3','col4']

plt.scatter(x='col2', y='col4', s=120, c='b', label='Highly Active')

plt.legend(loc='upper right')
plt.xlabel('Freq (x)')
plt.ylabel('Freq(y)')
#plt.gca().set_xscale("log")
#plt.gca().set_yscale("log")
plt.show()

From 4 columns I am trying to plot col[1] and col[3] data points as pair in scatter plot.

Error

Traceback (most recent call last):
  File "00_scatter_plot.py", line 14, in <module>
    plt.scatter(x='col2', y='col3', s=120, c='b', label='Highly Active')
  File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 3087, in scatter
    linewidths=linewidths, verts=verts, **kwargs)
  File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 6337, in scatter
    self.add_collection(collection)
  File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 1481, in add_collection
    self.update_datalim(collection.get_datalim(self.transData))
  File "/usr/lib/pymodules/python2.7/matplotlib/collections.py", line 185, in get_datalim
    offsets = np.asanyarray(offsets, np.float_)
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.py", line 514, in asanyarray
    return array(a, dtype, copy=False, order=order, subok=True)
ValueError: could not convert string to float: col2
Sitz Blogz
  • 1,061
  • 6
  • 30
  • 54

2 Answers2

2

This is your error message:

File "00_scatter_plot.py", line 14, in <module>
    plt.scatter(x='col2', y='col3', s=120, c='b', label='Highly Active')
ValueError: could not convert string to float: col2

As you can see you're trying to convert a string "col2" to float.

By taking a look at your code, it seems that you wanna something like this:

plt.scatter(x=df['col2'], y=df['col4'], s=120, c='b', label='Highly Active')

instead of:

plt.scatter(x='col2', y='col4', s=120, c='b', label='Highly Active')
dot.Py
  • 5,007
  • 5
  • 31
  • 52
  • 1
    This used to be an issue in the past (see issue [#12380](https://github.com/pandas-dev/pandas/issues/12380)). In the latest `matplotlib` release (i.e. 3.x) you can use the column name, without having to pass on the `Series` object. – muxevola May 18 '20 at 09:31
1

try to change:

plt.scatter(x='col2', y='col4', s=120, c='b', label='Highly Active')

to:

df.plot.scatter(x='col2', y='col4', s=120, c='b', label='Highly Active')

it worked for me:

enter image description here

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419