1

I am looking for help with populating the contents of a new column in my dataframe. I need the New_column to populate based on content in other columns.

import pandas as pd

df = pd.read_csv('sample.txt')
# the data is imported as one column
df.columns = ['Test']
# split into columns
dfnew = df.Test.str.split(expand=True).rename(columns={0:'Datetime', 1:'P1', 2:'P2'})

# create a new column
dfnew["New_column"] = ""
print(dfnew)

                Datetime     P1           P2        New_column
8             'Name-1'      None         None          
9    2017-01-01T00:00:00    2800         1600          
10   2017-02-01T00:00:00  -99999         2375            
..                   ...     ...          ...       ...
72            'Name-2'      None         None         
73   2018-10-11T00:00:00     0           2000          
74   2018-10-18T00:00:00     0           2000                  
..                   ...     ...          ...       ...
[724 rows x 4 columns]

In the .txt file, the P1 and P2 rows are blank when there is a Name-# value in the Datetime column, but when printing the df, the blanks are replaced with ‘None’. Every x number of rows, the Name-# in the Datetime column changes (the number associated to the name does not increase in any order). I want the New_column to populate every row with the Name-# found in the Datetime column, until the next Name-# value replaces it:

                Datetime     P1           P2        New_column
8             'Name-1'       None         None          
9    2017-01-01T00:00:00     2800         1600          Name-1
10   2017-02-01T00:00:00   -99999         2375          Name-1
..                   ...     ...          ...       ...
72            'Name-2'       None         None         
73   2020-10-11T00:00:00      0           2000          Name-2
74   2020-10-18T00:00:00      0           2000          Name-2       
..                   ...     ...          ...       ...
623           'Name-14'      None         None         
624  2020-04-21T00:00:00   -99999         730           Name-14
625  2020-04-27T00:00:00      0           260           Name-14
..                   ...     ...          ...       ...
[724 rows x 4 columns]

I also want to delete the rows that have Name-# in the Datetime column (i.e rows 8, 72,623..etc.). I need this process to be automated such that I can import .txt files of the same style but not necessarily of the same size or the same Name-# values. I have tried creating a list using a for loop with multiple if statements, and then assigning New_column to that list, but I can’t seem to get it to work..

I am a beginner with Python, and any help would really be appreciated.

kn2298
  • 49
  • 5

1 Answers1

1

try below code. first we create a new column using Datetime column. if the column value contains 'Name' in then the value of 'new_col' will be the value of DateTime column else it would be np.NaN (equivalent to NULL).

then we use ffill() function to forward fill the new_col if the value is np.nan.

import numpy as np
dfnew['new_col']=[x if 'Name' in str(x) else np.nan for x in  dfnew.Datetime.values ]
dfnew['new_col']=dfnew['new_col'].ffill()

[ffill()][1]

[1]: https://www.geeksforgeeks.org/python-pandas-dataframe-ffill/#:~:text=ffill()%20function%20is%20used,propagate%20last%20valid%20observation%20forward.&text=inplace%20%3A%20If%20True%2C%20fill%20in,a%20column%20in%20a%20DataFrame).

Amit Kumar
  • 613
  • 3
  • 15
  • Hi @Amit, I was wondering if the opposite of this could be done? Such as taking a column full of different names and creating a new row containing only "name-#" each time the "name-#" changes in the other column? – kn2298 Nov 24 '20 at 19:46