0

I have problems with using pandas for pyplot. On the one hand the scale is wrong, since the value 10 on the y axis shows before 1.

On the other hand I get the error message:

TypeError: unsupported operand type(s) for -: 'str' and 'str'

When using yerr.

import matplotlib.pyplot as plt
import numpy as np
import matplotlib
import pandas as pd

df=pd.read_table('TI_attachment.dat', header=0, sep='\s+')

fig, ax=plt.subplots(figsize=(20,10))
ax.errorbar(x=df.iloc[:, 0:1], y=df.iloc[:, 1:2], yerr=df.iloc[:, 2:3], color='black')
ax.set_xlabel('Simulation Time per window [ns]', size=25)
ax.set_ylabel('Free energy of binding [kcal/mol]', size=25)

ax.tick_params(axis='both', labelsize=25)
plt.tight_layout()
#plt.savefig('PMF.png', format='png')
#plt.show()

This is what TI_attachment.dat looks like:

#Weight of restraints (%), Accumulative work (in kcal/mol), SEM (in kcal/mol)
0.0000      0.00000      0.00000
0.0040      3.23161      0.78401
0.0080      3.76232      0.79356
0.0160      4.50989      0.82542
0.0240      4.86168      0.82490
0.0400      5.48672      0.82894
0.0550      6.02476      0.82931
0.0865      6.73611      0.83116
0.1180      7.20339      0.83305
0.1810      7.69373      0.83432
0.2440      8.16010      0.83487
0.3700      8.87930      0.83952
0.4960      9.25889      0.84035
0.7480      9.83864      0.84071
1.0000     10.28260      0.84107

enter image description here

ta8
  • 313
  • 3
  • 12
  • 1
    It seems like it's reading your CSV as strings rather than numbers. Check out what `df.dtypes` gives. You can use the `dtype` optional argument in your `read_csv` statement: `df=pd.read_table('TI_attachment.dat', header=0, sep='\s+', dtype={'': np.float64, ': np.float64, etc})` – RagingRoosevelt Apr 09 '18 at 15:56
  • 1
    If your dtype is wrong and you want to change it after the fact, you can also use `pandas.DataFrame.astype` or `pandas.to_numeric`. So that would be something like `df[''] = pd.to_numeric(df[''])` – RagingRoosevelt Apr 09 '18 at 16:00

1 Answers1

1

I solved it by selecting the columns in a different way:

 ax.errorbar(x=df.iloc[:, 0], y=df.iloc[:, 1], yerr=df.iloc[:, 2], color='black')
ta8
  • 313
  • 3
  • 12
  • 1
    The problem you have in the question is that you are using strings to plot. This will lead to the error and to alphabetically sorted ticklabels. The solution, as pointed out in the comments, is not to plot strings. I don't see how slicing the data differently would solve this problem. – ImportanceOfBeingErnest Apr 09 '18 at 18:04
  • The problem is that due to the header beeing separated by ',' a column gets generated for each word in the header, ie a lot of NANs. Apparently this resulted in the columns being read as strings. – ta8 Apr 10 '18 at 08:43