0

I have a dataframe as below (obtained after lot of preprocessing)

Please find dataframe

d = {'token': {361: '180816_031', 119: '180816_031', 101: '180816_031', 135: '180816_031', 292: '180816_031',
           133: '180816_031', 99: '180816_031', 270: '180816_031', 19: '180816_031', 382: '180816_031',
           414: '180816_031', 267: '180816_031', 218: '180816_031', 398: '180816_031', 287: '180816_031',
           155: '180816_031', 392: '180816_031', 265: '180816_031', 239: '180816_031', 237: '180816_031'},
 'station': {361: 'deneb', 119: 'callisto', 101: 'callisto', 135: 'callisto', 292: 'callisto', 133: 'deneb',
             99: 'callisto', 270: 'callisto', 19: 'deneb', 382: 'callisto', 414: 'deneb', 267: 'callisto',
             218: 'deneb', 398: 'callisto', 287: 'deneb', 155: 'deneb', 392: 'deneb', 265: 'callisto',
             239: 'callisto', 237: 'callisto'},
 'cycle_number': {361: 'cycle09', 119: 'cycle06', 101: 'cycle04', 135: 'cycle01', 292: 'cycle04', 133: 'cycle05',
                  99: 'cycle06', 270: 'cycle07', 19: 'cycle04', 382: 'cycle08', 414: 'cycle04', 267: 'cycle10',
                  218: 'cycle07', 398: 'cycle08', 287: 'cycle09', 155: 'cycle08', 392: 'cycle06', 265: 'cycle02',
                  239: 'cycle09', 237: 'cycle07'},
 'variable': {361: 'adj_high_quality_reads', 119: 'short_pass', 101: 'short_pass', 135: 'cell_mask_bilayers_sum',
              292: 'adj_active_polymerase', 133: 'cell_mask_bilayers_sum', 99: 'short_pass',
              270: 'adj_active_polymerase', 19: 'Unnamed: 0', 382: 'adj_high_quality_reads',
              414: 'num_align_high_quality_reads', 267: 'adj_active_polymerase', 218: 'adj_single_pores',
              398: 'num_align_high_quality_reads', 287: 'adj_active_polymerase', 155: 'cell_mask_bilayers_sum',
              392: 'num_align_high_quality_reads', 265: 'adj_active_polymerase', 239: 'adj_single_pores',
              237: 'adj_single_pores'},
 'value': {361: 99704.0, 119: 2072785.0, 101: 2061059.0, 135: 1682208.0, 292: 675306.0, 133: 1714292.0,
           99: 2072785.0, 270: 687988.0, 19: 19.0, 382: np.nan, 414: 285176.0, 267: 86914.0, 218: 948971.0,
           398: 405196.0, 287: 137926.0, 155: 1830032.0, 392: 480081.0, 265: 951689.0, 239: 681452.0,
           237: 882671.0}}

Data:

          token   station cycle_number                      variable  \
19   180816_031     deneb      cycle04                    Unnamed: 0   
99   180816_031  callisto      cycle06                    short_pass   
101  180816_031  callisto      cycle04                    short_pass   
119  180816_031  callisto      cycle06                    short_pass   
133  180816_031     deneb      cycle05        cell_mask_bilayers_sum   
135  180816_031  callisto      cycle01        cell_mask_bilayers_sum   
155  180816_031     deneb      cycle08        cell_mask_bilayers_sum   
218  180816_031     deneb      cycle07              adj_single_pores   
237  180816_031  callisto      cycle07              adj_single_pores   
239  180816_031  callisto      cycle09              adj_single_pores   
265  180816_031  callisto      cycle02         adj_active_polymerase   
267  180816_031  callisto      cycle10         adj_active_polymerase   
270  180816_031  callisto      cycle07         adj_active_polymerase   
287  180816_031     deneb      cycle09         adj_active_polymerase   
292  180816_031  callisto      cycle04         adj_active_polymerase   
361  180816_031     deneb      cycle09        adj_high_quality_reads   
382  180816_031  callisto      cycle08        adj_high_quality_reads   
392  180816_031     deneb      cycle06  num_align_high_quality_reads   
398  180816_031  callisto      cycle08  num_align_high_quality_reads   
414  180816_031     deneb      cycle04  num_align_high_quality_reads   

         value  
19        19.0  
99   2072785.0  
101  2061059.0  
119  2072785.0  
133  1714292.0  
135  1682208.0  
155  1830032.0  
218   948971.0  
237   882671.0  
239   681452.0  
265   951689.0  
267    86914.0  
270   687988.0  
287   137926.0  
292   675306.0  
361    99704.0  
382        NaN  
392   480081.0  
398   405196.0  
414   285176.0  

I am trying to create scatterplot with smooth line

fig,ax = plt.subplots()
fig.set_size_inches(16,4)

#to get different colors for each of the `variable` value assign the variable to hue
g2=sns.lmplot(x='cycle_number',y='value',data=df, hue='variable',  size=4, aspect=5)

This code gives plots only one value for scatter plot, however my expected output is as below

Expected Output:

enter image description here

TRIES RESULT

TRY 1

I tried to create bar plot (with some help) & i was successful but with scatter plot I could not do it

Below code to convert same to bar

df1 = df.groupby(['token','variable']).agg({'value': 'mean'})
df1.reset_index(inplace=True)
df1.sort_values('value',inplace=True,ascending=False)
fig,ax = plt.subplots()
fig.set_size_inches(16,8)
#to get different colors for each of the variable assign the variable to hue
g=sns.barplot(x='token',y='value',data=df1, hue='variable',ax=ax)
#Code for to put legend outside the plot
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
# Put a legend to the right of the current axis
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
# Adding respective values to the top of each bar
for p in ax.patches: 
    ax.annotate("%d" % p.get_height(), (p.get_x() + p.get_width() / 2, p.get_height()),
                ha='center', va='center', fontsize=11, color='black', xytext=(0, 10), 
                textcoords='offset points',fontweight='bold')
plt.show()

TRY 2

g2=sns.lmplot(x='cycle_number',y='value',data=df), this gives error 
ValueError: could not convert string to float: 'cycle10'

I know what the error means here but I am feeling helpless trying to replicate to output code

TRY 3:

sns.lmplot('cycle_number', 'value', data=df, hue='variable', fit_reg=False)

OUTPUT GENERATED: Blank GRID

Vaibhav Singh
  • 1,159
  • 1
  • 10
  • 25

1 Answers1

6

use:

sns.pointplot('cycle_number', 'value', data=df, hue='variable')

enter image description here

documnetation: https://seaborn.pydata.org/generated/seaborn.pointplot.html

OUTPUT GENERATE USING THIS VS EXPECTED OUTPUT

enter image description here

Expected OUTPUT

Try this:

df = pd.DataFrame(d)

df['cycle_number'] = df['cycle_number'].str.replace('cycle', '')
df['cycle_number'] = df['cycle_number'].apply(pd.to_numeric)
print(df)

fig, ax = plt.subplots()
fig.set_size_inches(16, 4)
# sns.pointplot('cycle_number', 'value', data=df, hue='variable', err_style="bars", ci=68)
sns.lmplot('cycle_number', 'value', data=df, hue='variable', ci=None, order=2, truncate=True)
# use order = 5 to see more curve

output for order=2

enter image description here

OUTPUT as per the latest code shared ( for order=2)

Final Output - The legend is overlapping with graph area

OUTPUT 4 ( for order=5)*: USING ORDER =5

The graph curve is perfectly fine except that the legend overlaps with plotting area.

Nihal
  • 5,262
  • 7
  • 23
  • 41
  • Thanks Nihal, however the ouput requires me to have a curved smooth line across my points, The code mentioned above just connects all the points – Vaibhav Singh Aug 24 '18 at 13:02
  • I have added generated output Vs Expected Output above, for your reference. Please check the difference – Vaibhav Singh Aug 24 '18 at 13:07
  • i know the problem with lmplot. It requires cycles to be float value which happans to be string in your case. that's y it data give error with lmplot – Nihal Aug 24 '18 at 13:12
  • Nihal, Please Note I have put output of code which was shared by you above, It is different from expected output in the sense that it is not smooth line but a line connecting the points. Please help in that – Vaibhav Singh Aug 24 '18 at 13:25
  • I know that. i tried with lmplot but it will only give linear output, not curved. you should look into fast fourier transform. which is used for cured graph output – Nihal Aug 24 '18 at 13:28
  • No Problem, Can you please tag any experienced Mentor who would be familiar with this type of problem. Please – Vaibhav Singh Aug 24 '18 at 13:33
  • NIhal, Please find output of Code, It works perfectly except the overlapping part, Please help sort that out & it will be perfect ! – Vaibhav Singh Aug 24 '18 at 14:30
  • argument order is the key to go. of order is >1 then it will give curverd grahp – Nihal Aug 24 '18 at 14:31
  • Agree! Can you please see the overlapping grid problem & suggest why that is happening – Vaibhav Singh Aug 24 '18 at 14:32
  • this can be done better if you use `np.polyfit` in your data then use lmplot – Nihal Aug 24 '18 at 14:33
  • I am sorry, I am relatively new to this (Extreme apologies) but I mean to say that legend is overlapping in the grid area, can you share resolution to that from the code that we are running. – Vaibhav Singh Aug 24 '18 at 14:34
  • as i told increase the order to get more curve – Nihal Aug 24 '18 at 14:34
  • use order = 5, and check – Nihal Aug 24 '18 at 14:35
  • may be tomorrow or you can ask new question for that – Nihal Aug 24 '18 at 14:36
  • i suggest you should ask saperate question for that matter. because your question is related to get curved output which i have already answered. if you do that you will get better response. may get reputation as well. – Nihal Aug 25 '18 at 05:56
  • Did not know about that, Accepted the answer with biggest of thanks ! Can you please look at this query, https://stackoverflow.com/questions/52013973/issue-in-error-bars-in-seaborn-barplot-python I just need help with err plot – Vaibhav Singh Aug 25 '18 at 08:18