1

Why do I get the list index out of range when I try to read a batch .msg files? My objective is to read the data then use pandas and create a DF where each row represents the Rain Gauge. The column headers will consist of timestamps paired with either 'PastHour'. Each cell within the DF will contain the corresponding rainfall value.

Error processing file FW_ Rain Report at 7_2_2020 10_39_40 PM.msg: list index out of range
Error processing file FW_ Rain Report at 7_2_2020 11_39_40 PM.msg: list index out of range

my code

import os
import extract_msg
import pandas as pd
from datetime import datetime

# Directory containing .msg files
directory = 'C:\\Users'

# Initialize an empty dictionary to store the data
data = {}

# Loop through each file in the directory
for filename in os.listdir(directory):
    if filename.endswith('.msg'):
        try:
            # Extract the content of the .msg file
            msg = extract_msg.Message(os.path.join(directory, filename))
            content = msg.body
            
            # Extract the timestamp
            timestamp_line = content.split('\n')[3]
            timestamp_str = timestamp_line.split('at ')[1].strip()
            timestamp = datetime.strptime(timestamp_str, '%m/%d/%Y %I:%M:%S %p')
            
            # Extract the Rain Gauge data
            lines = content.split('\n')
            for line in lines:
                if line and line.split()[0] != 'Gauge':
                    parts = line.split()
                    gauge = parts[0]
                    past_hour = parts[1]
                    
                    # Add the data to the dictionary
                    if gauge not in data:
                        data[gauge] = {}
                    data[gauge][timestamp] = past_hour
        except Exception as e:
            print(f"Error processing file {filename}: {e}")

# Convert the dictionary to a DataFrame
df = pd.DataFrame(data).T

# Sort the columns (timestamps)
df = df.sort_index(axis=1)

# Save the DataFrame to a CSV file
df.to_csv('rain_gauge_data.csv')

print("Data has been saved to 'rain_gauge_data.csv'")

content of the .msg file

________________________________________
From: Random <user@gmail.com>
Sent: Sunday, July 2, 2020 9:39:41 PM (UTC-05:00) Eastern Time (US & Canada)
To: Gauge-Rain Notification
Subject: Rain Report at 7/2/2020 9:39:40 PM

Rain Report at 7/2/2020 9:39:40 PM

Rain    Past    This
Gauge   Hour    Event**

Pereira    0.00          0.00
Puebla    0.30          0.49
CI       0.00          0.11
Tokito     0.01          0.18
CO       0.00          0.04
KP          N/A          N/A
DSS      0.00           0.00
PL       0.00          0.00
TSM      0.00          0.00
PKP      0.00          0.01
RP      1.00          1.42
HP      0.00          0.00
GG      0.20          0.45
BB      0.00          0.28

Since the onset of this particular rain event. (May include rainfall before midnight) **Each rain gauge has separate rain events.

Flow Report at 7/2/2020 9:39:40 PM
Jose Vasquez
  • 159
  • 8
  • 1
    Try doing `line = line.strip()` before splitting it. If the line contains only whitespace, there won't be any `parts[1]`. – Barmar Jul 06 '23 at 15:32
  • 1
    Get rid of the `try/except` so you'll see the complete error with traceback. That way you'll know which line is getting the error. – Barmar Jul 06 '23 at 15:33

0 Answers0