0

I have several .csv files with 2 slightly different formats.

Format 1: X-XXX_2020-11-05_13-54-55-555__XX.csv
Format 2: X-XXX_2020-11-05_13-54-55-555__XXX.csv

I need to extract dametime field to add it to pandas dataframe. Normally I would just use simple slicing
datetime.datetime.strptime(string1[-31:-8], "%Y-%m-%d_%H-%M-%s-%f")

which would give me desired result, but only for Format1.

For Format2 I need to move indexes for slicing by 1 because of the different ending.
Also I can not index from the start because of other operations.

At the moment I got around it by using IF statement looking like this:

def tdate():
    if string1[-7]=='X':
        return datetime.datetime.strptime(string1[-32:-9], "%Y-%m-%d_%H-%M-%s-%f")
    else:
        return datetime.datetime.strptime(string1[-31:-8], "%Y-%m-%d_%H-%M-%s-%f")

is there simpler way to make "dynamic" indexes so I could avoid creating additional def?

Thank you!

iMS44
  • 113
  • 1
  • 7

1 Answers1

1

Using str.split with list slicing

Ex:

import datetime

for i in ("X-XXX_2020-11-05_13-54-55-555__XX.csv", "X-XXX_2020-11-05_13-54-55-555__XXX.csv"):
    print(datetime.datetime.strptime("_".join(i.split("_")[1:3]), "%Y-%m-%d_%H-%M-%S-%f"))

OR using regex.

Ex:

import re
import datetime

for i in ("X-XXX_2020-11-05_13-54-55-555__XX.csv", "X-XXX_2020-11-05_13-54-55-555__XXX.csv"):
    d = re.search(r"(?<=_)(.+)(?=__)", i)
    if d:
        print(datetime.datetime.strptime(d.group(1), "%Y-%m-%d_%H-%M-%S-%f"))

Output:

2020-11-05 13:54:55.555000
2020-11-05 13:54:55.555000
Rakesh
  • 81,458
  • 17
  • 76
  • 113