Is there a more efficient way to reformat a pandas dataframe?

Question

I'm working on an anomaly detection model for CPU data and my current dataframes are structured like this (with 10k+ servers and 168 hours):

Server	Hour1	Hour2	Hour3
server1	val1.1	val1.2	val1.3
server2	val2.1	val 2.2	val2.3

I need it to be in a structure like this:

Server	time	Value
server1	0	value0
server1	1	value1
server2	0	value0
server2	1	value1

The problem is there are 10k+ servers and 168 hourly values for each, so it's taking an eternity to iterate over. Is there a more efficient way to do this transformation?

My current attempt is creating a new df with nested for loops like so:

for index, row in df.iterrows():
    for n in range(0,167):
        new_df.loc[len(new_df.index)] = row
        new_df.iat[len(new_df.index)-1, 2] = n

for index, row in new_df.iterrows():
    for i, r in df.iterrows():
        new_df_ts = row[2]
        if(row[0] == r[0]):
            new_df.iat[index, 3] = df.iat[i, 2 + new_df_ts]

Welcome to SO. If possible try to explain your issue in words, don't just post the input data and desired output, that means the search engine, indexing and tagging won't be able to find similar questions. — smci, Apr 19 '22 at 11:04
Near-duplicate: [Pandas: transform column names to row values](https://stackoverflow.com/questions/52746702/pandas-transform-column-names-to-row-values) — smci, Apr 19 '22 at 11:14
Does this answer your question? [Pandas: transform column names to row values](https://stackoverflow.com/questions/52746702/pandas-transform-column-names-to-row-values) — Stuart, Apr 19 '22 at 11:17
Also https://stackoverflow.com/questions/69853463/convert-dataframe-hourly-values-in-columns-to-a-serie-python — Stuart, Apr 19 '22 at 11:18

Stuart · Accepted Answer · 2022-04-19T11:17:15.740

Use wide_to_long

import pandas as pd
import io
df = pd.read_csv(io.StringIO("""
Server  Hour1   Hour2   Hour3
server1 val1.1  val1.2  val1.3
server2 val2.1  val2.2  val2.3"""), sep="\t")
df = pd.wide_to_long(df, "Hour", "Server", "Time") \
    .rename(columns={"Hour": "Value"}) \
    .reset_index()

which results in

    Server  Time    Value
0  server1     1   val1.1
1  server2     1   val2.1
2  server1     2   val1.2
3  server2     2   val2.2
4  server1     3   val1.3
5  server2     3   val2.3

Alternatively, use melt and then strip the Hour text out of the Time column.

df = df.melt("Server", var_name="Time")
df["Time"] = df["Time"].str.strip("Hour").astype(int)

Then do whatever additional processing is required on the Time and Value columns, such as subtracting 1 from the time period. Use df.replace or df["Value"].str.replace to alter values if needed.

Is there a more efficient way to reformat a pandas dataframe?

1 Answers1