Why do I get the list index out of range when I try to read a batch .msg files? My objective is to read the data then use pandas and create a DF where each row represents the Rain Gauge. The column headers will consist of timestamps paired with either 'PastHour'. Each cell within the DF will contain the corresponding rainfall value.
Error processing file FW_ Rain Report at 7_2_2020 10_39_40 PM.msg: list index out of range
Error processing file FW_ Rain Report at 7_2_2020 11_39_40 PM.msg: list index out of range
my code
import os
import extract_msg
import pandas as pd
from datetime import datetime
# Directory containing .msg files
directory = 'C:\\Users'
# Initialize an empty dictionary to store the data
data = {}
# Loop through each file in the directory
for filename in os.listdir(directory):
if filename.endswith('.msg'):
try:
# Extract the content of the .msg file
msg = extract_msg.Message(os.path.join(directory, filename))
content = msg.body
# Extract the timestamp
timestamp_line = content.split('\n')[3]
timestamp_str = timestamp_line.split('at ')[1].strip()
timestamp = datetime.strptime(timestamp_str, '%m/%d/%Y %I:%M:%S %p')
# Extract the Rain Gauge data
lines = content.split('\n')
for line in lines:
if line and line.split()[0] != 'Gauge':
parts = line.split()
gauge = parts[0]
past_hour = parts[1]
# Add the data to the dictionary
if gauge not in data:
data[gauge] = {}
data[gauge][timestamp] = past_hour
except Exception as e:
print(f"Error processing file {filename}: {e}")
# Convert the dictionary to a DataFrame
df = pd.DataFrame(data).T
# Sort the columns (timestamps)
df = df.sort_index(axis=1)
# Save the DataFrame to a CSV file
df.to_csv('rain_gauge_data.csv')
print("Data has been saved to 'rain_gauge_data.csv'")
content of the .msg file
________________________________________
From: Random <user@gmail.com>
Sent: Sunday, July 2, 2020 9:39:41 PM (UTC-05:00) Eastern Time (US & Canada)
To: Gauge-Rain Notification
Subject: Rain Report at 7/2/2020 9:39:40 PM
Rain Report at 7/2/2020 9:39:40 PM
Rain Past This
Gauge Hour Event**
Pereira 0.00 0.00
Puebla 0.30 0.49
CI 0.00 0.11
Tokito 0.01 0.18
CO 0.00 0.04
KP N/A N/A
DSS 0.00 0.00
PL 0.00 0.00
TSM 0.00 0.00
PKP 0.00 0.01
RP 1.00 1.42
HP 0.00 0.00
GG 0.20 0.45
BB 0.00 0.28
Since the onset of this particular rain event. (May include rainfall before midnight) **Each rain gauge has separate rain events.
Flow Report at 7/2/2020 9:39:40 PM