I am looking for help with populating the contents of a new column in my dataframe. I need the New_column to populate based on content in other columns.
import pandas as pd
df = pd.read_csv('sample.txt')
# the data is imported as one column
df.columns = ['Test']
# split into columns
dfnew = df.Test.str.split(expand=True).rename(columns={0:'Datetime', 1:'P1', 2:'P2'})
# create a new column
dfnew["New_column"] = ""
print(dfnew)
Datetime P1 P2 New_column
8 'Name-1' None None
9 2017-01-01T00:00:00 2800 1600
10 2017-02-01T00:00:00 -99999 2375
.. ... ... ... ...
72 'Name-2' None None
73 2018-10-11T00:00:00 0 2000
74 2018-10-18T00:00:00 0 2000
.. ... ... ... ...
[724 rows x 4 columns]
In the .txt file, the P1 and P2 rows are blank when there is a Name-# value in the Datetime column, but when printing the df, the blanks are replaced with ‘None’. Every x number of rows, the Name-# in the Datetime column changes (the number associated to the name does not increase in any order). I want the New_column to populate every row with the Name-# found in the Datetime column, until the next Name-# value replaces it:
Datetime P1 P2 New_column
8 'Name-1' None None
9 2017-01-01T00:00:00 2800 1600 Name-1
10 2017-02-01T00:00:00 -99999 2375 Name-1
.. ... ... ... ...
72 'Name-2' None None
73 2020-10-11T00:00:00 0 2000 Name-2
74 2020-10-18T00:00:00 0 2000 Name-2
.. ... ... ... ...
623 'Name-14' None None
624 2020-04-21T00:00:00 -99999 730 Name-14
625 2020-04-27T00:00:00 0 260 Name-14
.. ... ... ... ...
[724 rows x 4 columns]
I also want to delete the rows that have Name-# in the Datetime column (i.e rows 8, 72,623..etc.). I need this process to be automated such that I can import .txt files of the same style but not necessarily of the same size or the same Name-# values. I have tried creating a list using a for loop with multiple if statements, and then assigning New_column to that list, but I can’t seem to get it to work..
I am a beginner with Python, and any help would really be appreciated.