1

I'm working on an anomaly detection model for CPU data and my current dataframes are structured like this (with 10k+ servers and 168 hours):

Server Hour1 Hour2 Hour3
server1 val1.1 val1.2 val1.3
server2 val2.1 val 2.2 val2.3

I need it to be in a structure like this:

Server time Value
server1 0 value0
server1 1 value1
server2 0 value0
server2 1 value1

The problem is there are 10k+ servers and 168 hourly values for each, so it's taking an eternity to iterate over. Is there a more efficient way to do this transformation?

My current attempt is creating a new df with nested for loops like so:

for index, row in df.iterrows():
    for n in range(0,167):
        new_df.loc[len(new_df.index)] = row
        new_df.iat[len(new_df.index)-1, 2] = n

for index, row in new_df.iterrows():
    for i, r in df.iterrows():
        new_df_ts = row[2]
        if(row[0] == r[0]):
            new_df.iat[index, 3] = df.iat[i, 2 + new_df_ts]


smci
  • 32,567
  • 20
  • 113
  • 146
Tyler
  • 25
  • 4
  • Welcome to SO. If possible try to explain your issue in words, don't just post the input data and desired output, that means the search engine, indexing and tagging won't be able to find similar questions. – smci Apr 19 '22 at 11:04
  • Near-duplicate: [Pandas: transform column names to row values](https://stackoverflow.com/questions/52746702/pandas-transform-column-names-to-row-values) – smci Apr 19 '22 at 11:14
  • Does this answer your question? [Pandas: transform column names to row values](https://stackoverflow.com/questions/52746702/pandas-transform-column-names-to-row-values) – Stuart Apr 19 '22 at 11:17
  • Also https://stackoverflow.com/questions/69853463/convert-dataframe-hourly-values-in-columns-to-a-serie-python – Stuart Apr 19 '22 at 11:18

1 Answers1

3

Use wide_to_long

import pandas as pd
import io
df = pd.read_csv(io.StringIO("""
Server  Hour1   Hour2   Hour3
server1 val1.1  val1.2  val1.3
server2 val2.1  val2.2  val2.3"""), sep="\t")
df = pd.wide_to_long(df, "Hour", "Server", "Time") \
    .rename(columns={"Hour": "Value"}) \
    .reset_index()

which results in

    Server  Time    Value
0  server1     1   val1.1
1  server2     1   val2.1
2  server1     2   val1.2
3  server2     2   val2.2
4  server1     3   val1.3
5  server2     3   val2.3

Alternatively, use melt and then strip the Hour text out of the Time column.

df = df.melt("Server", var_name="Time")
df["Time"] = df["Time"].str.strip("Hour").astype(int)

Then do whatever additional processing is required on the Time and Value columns, such as subtracting 1 from the time period. Use df.replace or df["Value"].str.replace to alter values if needed.

Stuart
  • 9,597
  • 1
  • 21
  • 30