I have an assignment to export neat CSV files where only the headers and data are present, all other data must be filtered out. There are about 500+ text files.
Each file must be a separate CSV file, the format must be "YEAR-MONTH-DAY (ORIGINAL_FILE_NAME)".
An example of this is: Original file: pm990902.b17
CSV file: 1999-09-02 (pm990902.b17).csv
I already have code for filtering the data:
*
import pandas as pd
import numpy as np
import glob
pred = lambda x: x in np.arange(0, 192, 1)
inval = [99999.9, 999.0, 999.9900, 999.9]
files = glob.glob('C:\\Users\Lenovo\Desktop\Python\Files\*')
for file in files:
df = pd.read_csv(file, header = 0, delim_whitespace=True, skiprows=pred,
engine='python', na_values=inval)
df = df[1:]
df.to_csv('Name of the new file.csv', index=False)
I still can't figure out how to do the new name of the file (the date) which is actually the problem for me.
This is what the file looks like with the date in the first line:
*AAAAAAAAAAAAAAAAAAAAAAAAAA zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz 05-JAN-2000 12:21:0005-JAN-2000 14:00:300102
160 2160
1.00 1.0 1.00 1.00 1.0000 1.0 1.0 1.0 1.0000 1.0000 1.00 1.000 1.0 1.0 1.0000 1.0000
9999.90 99999.0 999.90 999.00 99.9900 999.0 999.9 99999.9 999.9900 999.9900 999.90 99.990 999.9 999.9 99.9900 99.9900
Pressure [hPa]
Geopotential height [gpm]
Temperature [K]
Relative humidity [%]
Ozone partial pressure [mPa]
Horizontal wind direction [decimal degrees]
Horizontal wind speed [m/s]
GPS geometric height [m]
GPS longitude [decimal degrees E]
GPS latitude [decimal degrees N]
Internal temperature [K]
Ozone raw current [microA]
Battery voltage [V]
Pump current [mA]
Ozone mixing ratio per volume [ppm]
Ozone partial pressure uncertainty estimate [mPa]*
I can't attach the whole text file, but this is an example of the beginning of every text file.
So how can I get the desired date for the file name out of this line?