1

I am trying to plot second column of a csv file but second column returning the nan values. If I set the dtype= None or dtype= str will return the string values which at the end cause plotting in the wrong way in Y axis. any comment would appreciated.

second column in csv include values from 80-99% which save before like this snippet

 numpy.savetxt('loss_acc_exp4.csv', numpy.c_[losses, ACCes], fmt=['%.4f', '%d %%'], header= "loss, acc", comments='', delimiter = ",")                    

Here is snippet for plot the csv file

import numpy
import matplotlib.pyplot as plt
acc_history = numpy.genfromtxt("loss_acc_exp4.csv", delimiter=",", 
skip_header=1, usecols=(1))


num_epochs = 151
epochs = range(1, num_epochs)

plt.figure(figsize=(10,6))
plt.plot(epochs, acc_history[::2], '-b', label='Training accuracy') #plot odd rows
plt.plot(epochs, acc_history[1::2], '-r', label='Validation accuracy') # plot even rows
plt.legend()
plt.xlabel('Epoch')
plt.ylabel('accuracy')
plt.xlim(0.01,151)
plt.show()

Data in csv look like this

enter image description here

import pandas as pd


def convert_percent(val):
"""
Convert the percentage string to an actual floating point percent
- Remove %
- Divide by 100 to make decimal
"""
new_val = val.replace('%', '')
new_val = pd.Series([new_val]).astype(float)
return (new_val) / 100

acc_history = pd.read_csv("loss_acc_exp4.csv", delimiter=",", header=0, 
usecols=[1], dtype=None)
acc_history[:].apply(convert_percent)

it is returning ValueError: ('setting an array element with a sequence.', 'occurred at index acc')

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
AI_NA
  • 336
  • 2
  • 5
  • 20
  • Can you share what your `loss_acc_exp4.csv` file looks like? This is necessary to know the format. – newkid Jan 27 '19 at 19:19
  • Default dtype is float. Strings that don't fit are returned as nan. – hpaulj Jan 27 '19 at 19:20
  • @newkid sorry. my problem is about acc col. – AI_NA Jan 27 '19 at 20:51
  • @Neda can you add a snipped to your question ? It is not clear from your comment the formatting of the file ( newline, delimiters, precision etc ) – newkid Jan 27 '19 at 20:52
  • if I get the dtype of acc_history for acc col is float64 for this line acc_history = numpy.genfromtxt("loss_acc_exp4.csv", delimiter=",", skip_header=1, usecols=(1)) – AI_NA Jan 27 '19 at 20:57
  • 1
    It's the '%' character that keeps `genfromtxt` from converting the strings to numbers. Try `float('91%')`. You could load with `dtype=None`, and convert those strings yourself after. Or write a converter to remove the '%' and return a float, e.g. `float(astr[:-1])`. Or change the `savetxt` to omit the unnecessary '%'. – hpaulj Jan 27 '19 at 21:04
  • @hpaulj thank you for your comment. I tried many ways to remove the % and convert data to float, none of them works. the last thing I tried I added it to the question. any idea how can I get rid of this error? – AI_NA Jan 28 '19 at 20:18

1 Answers1

2

Using a converter:

def foo(astr):
    return float(astr[:-1])
In [296]: np.genfromtxt('test.csv', delimiter=',', converters={1:foo})
Out[296]: 
array([[ 0.8469, 99.    ],
       [ 0.3569, 98.    ],
       [ 0.9622, 97.    ],
       [ 0.4774, 96.    ],
       [ 0.381 , 95.    ]])
hpaulj
  • 221,503
  • 14
  • 230
  • 353