0

I have a Pandas DataFrame of measurements:

,Fp076,Fp084,Fp092,Fp099,Fp107,Fp115,Fp122,Fp130,Fp143,Fp151,Fp158,Fp166,Fp174,Fp181,Fp189,Fp197,Fp204,Fp212,Fp220,Fp227
0,0.531743,0.512256,0.427771,0.444216,0.332228,0.296139,0.202653,0.298724,0.341529,0.276829,0.24803,0.278406,0.345853,0.317384,0.32032,0.179936,0.205871,0.495948,0.167417,0.097147
1,-0.032964,0.047469,0.128079,0.142839,0.253755,0.165963,0.210111,0.239816,0.162333,0.115085,0.129781,0.134795,0.09575,0.243093,0.10684,0.195201,0.143984,0.266312,0.198049,0.084467
2,0.459728,0.541346,0.830889,0.368135,0.407241,0.499617,0.383159,0.507517,0.409411,0.325441,0.305605,0.378738,0.342981,0.43766,0.295844,0.228164,0.276319,0.226467,0.375678,0.219189
3,2.6838,2.394591,2.493416,0.874906,2.113343,1.812258,1.667047,1.779347,1.515663,1.620196,1.539494,1.63528,1.555373,1.471318,1.610067,1.507087,1.467174,1.458346,1.681998,1.14625
4,0.368415,0.435004,0.155035,0.161064,0.180133,0.202117,0.142981,0.138321,0.122557,0.099213,0.098213,0.062174,0.123664,0.2051,0.167415,0.185133,0.127677,0.037875,0.156252,0.015579
5,0.213577,0.187244,0.274151,0.173572,0.296122,0.308341,0.164578,0.159559,0.318383,0.181329,0.260223,0.257395,0.241779,0.292731,0.244476,0.187523,0.247331,0.293338,0.323894,0.179478
6,0.096093,0.140454,0.067185,6.441058,0.016797,0.141757,0.181792,0.13692,0.204091,0.180182,0.149626,0.220342,0.179286,0.276316,0.104531,0.20343,0.045161,-0.004546,0.045833,0.193849
7,0.286467,0.086673,-0.106538,-0.261802,0.16964,0.182858,0.062774,0.20471,0.040105,0.086975,0.211068,0.182423,0.098721,0.077085,0.102986,0.129935,0.130571,0.176024,0.154079,0.102391
8,0.480631,0.714554,0.858241,0.746666,0.555411,0.452689,0.337912,0.333942,0.269359,0.221312,0.09818,0.226218,0.287361,0.209858,0.222951,0.207584,0.258397,0.026713,0.162048,0.149924
9,1.055405,0.638777,0.468793,0.41544,0.559187,0.471218,0.493805,0.544716,0.412903,0.412182,0.51041,0.383991,0.351397,0.383201,0.368308,0.237954,0.330242,0.262648,0.425204,0.434928
10,1.116658,0.737544,0.854376,-0.004434,0.419419,0.35921,0.377095,0.273815,0.258913,0.290614,0.271843,0.321572,0.234764,0.298931,0.206039,0.192746,0.200727,0.132419,0.229914,0.159857
11,-0.004305,0.052289,0.275035,-0.849414,0.104146,0.185819,0.128376,0.136433,0.091787,0.149753,0.107246,0.081407,0.118816,0.117434,0.169153,0.108273,0.205751,0.145238,0.153086,0.114278
12,0.836223,0.323901,0.269564,0.364082,0.343695,0.386785,0.24881,0.307267,0.222634,0.214189,0.12167,0.251107,0.134083,0.284545,0.175479,0.221877,0.184749,0.225089,0.205388,0.214972

where each row is the flux measurements at the frequencies in the header (76, 84, 92, 99... MHz). I'm trying to plot a line graph of the flux measurements for a row. Since the frequencies in the header are not linear, I've tried this:

f = np.array([76,84,92,99,107,115,122,130,143,151,158,166,174,181,189,197,204,212,220,227])
y1 = [0.531743,0.512256,0.427771,0.444216,0.332228,0.296139,0.202653,0.298724,0.341529,0.276829,0.24803,0.278406,0.345853,0.317384,0.32032,0.179936,0.205871,0.495948,0.167417,0.097147]
y2 = [-0.032964,0.047469,0.128079,0.142839,0.253755,0.165963,0.210111,0.239816,0.162333,0.115085,0.129781,0.134795,0.09575,0.243093,0.10684,0.195201,0.143984,0.266312,0.198049,0.084467]
y3 = [0.459728,0.541346,0.830889,0.368135,0.407241,0.499617,0.383159,0.507517,0.409411,0.325441,0.305605,0.378738,0.342981,0.43766,0.295844,0.228164,0.276319,0.226467,0.375678,0.219189]

fig, ax = plt.subplots()
ax.scatter(f, y1, label = r'$\alpha = -0.37$')
ax.plot(f, y1)
ax.scatter(f, y2, label = r'$\alpha = NaN$')
ax.plot(f, y2)
ax.scatter(f, y3, label = r'$\alpha = -0.75$')
ax.plot(f, y3)
ax.set_xlabel('Frequency (MHz)')
ax.set_ylabel('Flux (Jy/beam)')
ax.grid(which = 'both', axis = 'both')

which is just copy-pasting the first three rows of data, to produce:

enter image description here

That's basically what I want, but what's a better way to do it?

Jim421616
  • 1,434
  • 3
  • 22
  • 47
  • The default pandas histogram plot mode is to plot each column as a separate line plot. If you took the [transpose](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.transpose.html) of your dataframe, you'd have each row turn into a column. – Nick ODell Jul 17 '22 at 00:00
  • The is the correct way: 1. `df.columns = df.columns.str.replace('Fp', '').astype('int')`, 2. `df = df.T`, 3. `ax = df.plot(marker='.', figsize=(10, 7), title='Flux per Frequency', ylabel='Flux (Jy/beam)', xlabel='Frequency (MHz)', grid=True )` – Trenton McKinney Jul 17 '22 at 00:13
  • See [code and plot](https://i.stack.imgur.com/Lef6B.png) – Trenton McKinney Jul 17 '22 at 00:48
  • The `hsv` colormap was used to add more colors, since there are many observations. There are more colormaps at [Choosing Colormaps in Matplotlib](https://matplotlib.org/stable/tutorials/colors/colormaps.html) – Trenton McKinney Jul 17 '22 at 01:10
  • `df = pd.read_csv('file.csv', index_col=[0])` – Trenton McKinney Jul 17 '22 at 14:22

1 Answers1

1

There are many ways to solve this problem, but the simplest way (that I can think of) is to pivot your dataframe and then use seaborn to plot all the columns

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# convert you sample data
data = [[e for e in row.split(',') if e] for row in data_.split("\n") if row]
columns = data[0]
# create the `x` axis
columns = [int(col.replace('Fp','')) for col in columns]
columns = ['index'] + columns
data = data[1:]
df = pd.DataFrame(data=data, columns=columns)
df = df.drop(columns=['index'])
df = df.astype('float')

This is the example of the dataframe without transforming the headers with int(col.replace('Fp',''))

enter image description here

you can transform your columns as I did above using

df.columns = [int(col.replace('Fp',''))  for col in df.columns]

Once this is done you can do the following pivot

# the pivot of your data
df_ = df.T

# plot your data
plt.figure(figsize=(15,8))
sns.lineplot(data=df_)
plt.title('Example of timeseries plot')
plt.xlabel('Frequency(MHz)')
plt.ylabel('Flux (Jy/beam)')

the output is enter image description here

You can play around with the various plotting to your desire, but this would be the simplest way (tip - try to leverage as much of the seaborn or pandas plotting methods for this aggregated plots)

Chinny84
  • 956
  • 6
  • 16
  • The op states the data already starts in a dataframe. This is not the correct way beginning with the data in the dataframe. – Trenton McKinney Jul 17 '22 at 00:28
  • I know, but sadly the data is not in a pandas dataframe in the question - the first part of the code above is to take what he has and put it in a dataframe. I can be explicit and tell him to change the columns in his dataframe as mine, but apart from that it is consistent. – Chinny84 Jul 17 '22 at 00:30
  • The first sentence of the op _**I have a Pandas DataFrame of measurements:**_ – Trenton McKinney Jul 17 '22 at 00:31
  • yes - but I don't have the dataframe, I have to create it from the values he has provided. Did not think I had to explain that - but I have edited to make it clearer. – Chinny84 Jul 17 '22 at 00:36
  • The answer should **not** include constructing the dataframe, because that is irrelevant to the question. The OP already starts with a dataframe. The answer should show what to do with the dataframe to create the plot. There is no reason to use seaborn, because the dataframe can be plotted directly, as has already been demonstrated in a comment to the question. – Trenton McKinney Jul 17 '22 at 00:38
  • I disagree - mainly because people can replicate the steps and see how I arrived there. Maybe it is my method from mathstackexchange bleeding over to stackoverflow. But I made it clear that this was to recreate the dataframe, and can be safely ignored. – Chinny84 Jul 17 '22 at 00:41