I have a dataframe in pandas that I'm using to produce a scatterplot, and want to include a regression line for the plot. Right now I'm trying to do this with polyfit.
Here's my code:
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
from numpy import *
table1 = pd.DataFrame.from_csv('upregulated_genes.txt', sep='\t', header=0, index_col=0)
table2 = pd.DataFrame.from_csv('misson_genes.txt', sep='\t', header=0, index_col=0)
table1 = table1.join(table2, how='outer')
table1 = table1.dropna(how='any')
table1 = table1.replace('#DIV/0!', 0)
# scatterplot
plt.scatter(table1['log2 fold change misson'], table1['log2 fold change'])
plt.ylabel('log2 expression fold change')
plt.xlabel('log2 expression fold change Misson et al. 2005')
plt.title('Root Early Upregulated Genes')
plt.axis([0,12,-5,12])
# this is the part I'm unsure about
regres = polyfit(table1['log2 fold change misson'], table1['log2 fold change'], 1)
plt.show()
But I get the following error:
TypeError: cannot concatenate 'str' and 'float' objects
Does anyone know where I'm going wrong here? I'm also unsure how to add the regression line to my plot. Any other general comments on my code would also be hugely appreciated, I'm still a beginner.