0

I have an ascii file which contains in the first column the id of the day and in the following columns it contains values of each hour of the day (24 hours/columns). A sample of the file is shown below.

day h1  h2  h3  h4  h5  h6  h7  h8  h9  h10 h11 h12 h13 h14 h15 h16 h17 h18 h19 h20 h21 h22 h23 h24
1   0.4 0.4 0.2 0.2 0.4 0.8 0.9 0.8 1.2 1.4 1.8 1.4 1.7 2   1.8 2   1.5 1.4 1.4 1.1 1.5 1.5 1.2 0.9
2   0.7 0.6 0.3 0.4 0.8 1.4 2   1.7 1.5 1   1.2 0.8 0.8 0.8 0.6 2.2 3.2 2.7 4   5.2 6   5.6 5.4 5.6

I want to create a file that all the data values (24 hours) will be written in one column:

day hour    val
1   h1  0.4
1   h2  0.4
1   h3  0.2
1   h4  0.2
1   h5  0.4
1   h6  0.8
1   h7  0.9
1   h8  0.8
1   h9  1.2
1   h10 1.4
1   h11 1.8
1   h12 1.4
1   h13 1.7
1   h14 2
1   h15 1.8
1   h16 2
1   h17 1.5
1   h18 1.4
1   h19 1.4
1   h20 1.1
1   h21 1.5
1   h22 1.5
1   h23 1.2
1   h24 0.9
2   h1  0.7
2   h2  0.6
2   h3  0.3
2   h4  0.4
2   h5  0.8
2   h6  1.4
2   h7  2
2   h8  1.7
2   h9  1.5
2   h10 1
2   h11 1.2
2   h12 0.8
2   h13 0.8
2   h14 0.8
2   h15 0.6
2   h16 2.2
2   h17 3.2
2   h18 2.7
2   h19 4
2   h20 5.2
2   h21 6
2   h22 5.6
2   h23 5.4
2   h24 5.6

I use df.melt as following :

df = pd.read_csv(i)
    df_transposed = df.melt(id_vars='day', var_name='hour', value_name='val').sort_values('day')
    df_transposed.to_csv(outfile)

but I get the following error:

df_transposed = df.melt(id_vars='day', var_name='hour', value_name='val').sort_values('day')
  File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 7166, in melt
    return melt(
  File "/usr/local/lib/python3.9/site-packages/pandas/core/reshape/melt.py", line 122, in melt
    id_data = frame.pop(col)
  File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 4371, in pop
    return super().pop(item=item)
  File "/usr/local/lib/python3.9/site-packages/pandas/core/generic.py", line 661, in pop
    result = self[item]
  File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 2906, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/usr/local/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2900, in get_loc
    raise KeyError(key) from err
KeyError: 'day'

Do I have to define in another way the df.melt command?

Edit:

The df.columns command gives the following:

Index(['day\t"H1"\t"H2"\t"H3"\t"H4"\t"H5"\t"H6"\t"H7"\t"H8"\t"H9"\t"H10"\t"H11"\t"H12"\t"H13"\t"H14"\t"H15"\t"H16"\t"H17"\t"H18"\t"H19"\t"H20"\t"H21"\t"H22"\t"H23"\t"H24"'], dtype='object')
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2898, in get_loc
        return self._engine.get_loc(casted_key)
      File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
      File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
      File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
      File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
    KeyError: 'day'

Edit2: The above problem solved with this:

df = pd.read_csv(i, sep='\t')
print (df.columns)
df_transposed = df.melt(id_vars='day', var_name='hour', value_name='val').sort_values('day')
df_transposed.to_csv(outfile, sep='\t')

However, I want also to order the output values by day and hour. I tried the following:

df.melt(id_vars='day', var_name='hour', value_name='val').sort_values(['day','hour'],ascending=True)

but the output file is something like below. Is it possible to order the hours by H1 to H24 and not alphabetically?

    day hour    val
0   1   H1  0.4
3285    1   H10 -9999.0
3650    1   H11 -9999.0
4015    1   H12 -9999.0
4380    1   H13 -9999.0
4745    1   H14 -9999.0
5110    1   H15 -9999.0
5475    1   H16 -9999.0
5840    1   H17 -9999.0
6205    1   H18 -9999.0
6570    1   H19 -9999.0
365 1   H2  0.4
6935    1   H20 -9999.0
7300    1   H21 -9999.0
7665    1   H22 -9999.0
8030    1   H23 -9999.0
8395    1   H24 -9999.0
730 1   H3  0.6
1095    1   H4  0.6
1460    1   H5  0.4
1825    1   H6  0.4
2190    1   H7  0.3
2555    1   H8  -9999.0
2920    1   H9  -9999.0
1   2   H1  -9999.0
3286    2   H10 -9999.0
3651    2   H11 -9999.0
4016    2   H12 -9999.0
4381    2   H13 -9999.0
4746    2   H14 -9999.0
5111    2   H15 -9999.0
5476    2   H16 -9999.0
5841    2   H17 -9999.0
6206    2   H18 -9999.0
6571    2   H19 -9999.0
366 2   H2  -9999.0
6936    2   H20 -9999.0
7301    2   H21 -9999.0
7666    2   H22 -9999.0
8031    2   H23 -9999.0
8396    2   H24 -9999.0
731 2   H3  -9999.0
1096    2   H4  -9999.0
1461    2   H5  -9999.0
1826    2   H6  -9999.0
2191    2   H7  -9999.0
2556    2   H8  -9999.0
2921    2   H9  -9999.0
Nat
  • 325
  • 2
  • 13
  • This code works fine for me, i'd recommend looking at your `df` after reading it in and make sure it captured the column names correctly. – Chris Dec 19 '20 at 14:52
  • @skuzzy how I could use transpose and defining to group by days? – Nat Dec 19 '20 at 14:57
  • @Chris did you try it with my sample file? How I could check that it captures the names correctly? – Nat Dec 19 '20 at 14:58
  • @Nat Yes I did. You could check `df.columns` and make sure `day` is in there. – Chris Dec 19 '20 at 15:20
  • @Chris with df.columns I see the column 'day' but it also gives an error. I edited my question. Do you have any idea for this? – Nat Dec 19 '20 at 17:00
  • 1
    You have a tab separated file, not comma separated. try `pd.read_csv(i, sep='\t')` This should solve your problem. – Chris Dec 19 '20 at 19:35
  • Thank you @Chris the problem solved. However, is it possible to define in some way to order the values such as the example output? Because the output values are not in the correct order; it should be ordered by day and hour. I edited my question again. – Nat Dec 20 '20 at 08:30

0 Answers0