Problem using re.sub with a replacement including Vertical lines

Question

A Sample line of my data:

12808|08.12.2008|13:44:35|-0.05||||||||0.26|1.53|2.94|0.81|1.75|5.53|79.56||||2|K:\Path\to\File\TE08-08-Chla-12.08.2008.xls|19.01.2009 09:34:57|9|15||

The search patterns and function:

oldpatdat='[|][0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9][|]'
#Date like |30.12.2009|


oldpatdat='\|{1}[0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9]\|{1}'
#same, works finding, but substituting... nope


oldpatdattim='[0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9]\ {1}[0-5][0-9]\:{1}[0-5][0-9]\:{1}[0-5][0-9]'
#DateTime like 24.10.2010 12:34:56 works, as I don't need the use of Vertical lines


def setdef(line):
    print("line in setdef: "+line)
    if re.search(oldpatdattim,line):
        print("oldpatdattim found")
        klem=re.findall(oldpatdattim,line)
        print("Klem+time: "+str(klem))
        for ele in klem:
            print("ele: "+str(ele))
            new=ele[6:10]+"-"+ele[3:5]+"-"+ele[0:2]+ele[10:]
            print("new: "+new)
            line=re.sub(ele,new,line)
            print("line after sub1(dattim): "+str(line))
    if re.search(oldpatdat,line):
        print("oldpatdat found")
        klem=re.findall(oldpatdat,line)
        print("Klem: "+str(klem))
        for ele in klem:
            print("ele: "+str(ele))
            new="|"+ele[7:11]+"-"+ele[4:6]+"-"+ele[1:3]+"|"
            print("new: "+new)
            line=re.sub(ele,new,line)
            print("line after sub2(dat): "+str(line))

Output:

bash>line in setdef: 12806|08.12.2008|13:43:34|-0.06||||||||0|1.53|3.54|0.36|1.66|5.44|79.59||||2|K:\Path\to\File\TE08-08-Chla-12.08.2008.xls|19.01.2009 09:34:57|9|15||
bash>oldpatdattim found
bash>Klem+time: ['19.01.2009 09:34:57']
bash>ele: 19.01.2009 09:34:57
bash>new: 2009-01-19 09:34:57
bash>line after sub1(dattim): 12806|08.12.2008|13:43:34|-0.06||||||||0|1.53|3.54|0.36|1.66|5.44|79.59||||2|K:\Path\to\File\TE08-08-Chla-12.08.2008.xls|2009-01-19 09:34:57|9|15||
bash>oldpatdat found
bash>Klem: ['|08.12.2008|']
bash>ele: |08.12.2008|
bash>new: |2008-12-08|
bash>line after sub2(dat): |2008-12-08|1|2008-12-08|2|2008-12-08|8|2008-12-08|0|2008-12-08|6|2008-12-08|||2008-12-08||2008-12-08||2008-12-08|||2008-12-08|1|2008-12-08|3|2008-12-08|:|2008-12-08|4|2008-12-08|3|2008-12-08|:|2008-12-08|3|2008-12-08|4|2008-12-08|||2008-12-08|-|2008-12-08|0|2008-12-08|.|2008-12-08|0|2008-12-08|6|2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|||2008-12-08|0|2008-12-08|||2008-12-08|1|2008-12-08|.|2008-12-08|5|2008-12-08|3|2008-12-08|||2008-12-08|3|2008-12-08|.|2008-12-08|5|2008-12-08|4|2008-12-08|||2008-12-08|0|2008...etc

I'm working with python 3.8.2 and having a problem with regular expressions sub(). I have a CSV table with dates and times seperated by vertical lines. As I need to get the dates and times in the right format for my database, I tried regular expressions to look them up and work on them.

Because I don't want to edit dates from filenames, I'm searching the lines for dates like '|30.12.2009|' to be sure that it is a table column and not part of a filename. So I'm trying to replace it by '|2009-12-30|'. The first if clause works as expected, as there are no vertical lines in the search/replace pattern. The problem I encounter is with re.sub(). The element I'm looking for is found, the new string is prepared, but using re.sub() squashes the newly prepared date between every char of the line.

Escaping the vertical lines didn't work out for me, the way I did it. In fact it did, but only for the lookup. I'm a bit puzzled about it and haven't found any solution so far. How do I get my re.sub() doing the right thing?

If your file is a CSV with `|` as separator, why don't you read it as a CSV with `|` separators, using the `csv` module? — Thierry Lathuille, Aug 20 '20 at 14:51

score 0 · Accepted Answer · answered Aug 20 '20 at 15:10

try this

import re

oldpatdat='[|][0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9][|]'
#Date like |30.12.2009|


oldpatdat='\|{1}[0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9]\|{1}'
#same, works finding, but substituting... nope


oldpatdattim='[0-3][0-9]\.{1}[0-1][0-9]\.{1}[1-2][0-9][0-9][0-9]\ {1}[0-5][0-9]\:{1}[0-5][0-9]\:{1}[0-5][0-9]'
#DateTime like 24.10.2010 12:34:56 works, as I don't need the use of Vertical lines


def setdef(line):
    print("line in setdef: "+line)
    if re.search(oldpatdattim,line):
        print("oldpatdattim found")
        klem=re.findall(oldpatdattim,line)
        print("Klem+time: "+str(klem))
        for ele in klem:
            print("ele: "+str(ele))
            new=ele[6:10]+"-"+ele[3:5]+"-"+ele[0:2]+ele[10:]
            print("new: "+new)
            
            # for robustness
            line=re.sub(ele.replace('.', '\.').replace('|', '\|'),new,line)
            
            print("line after sub1(dattim): "+str(line))
    if re.search(oldpatdat,line):
        print("oldpatdat found")
        klem=re.findall(oldpatdat,line)
        print("Klem: "+str(klem))
        for ele in klem:
            print("ele: "+str(ele))
            new="|"+ele[7:11]+"-"+ele[4:6]+"-"+ele[1:3]+"|"
            print("new: "+new)
            
            line=re.sub(ele.replace('.', '\.').replace('|', '\|'),new,line)
            # or you could use line=line.replace(ele, new)
            
            print("line after sub2(dat): "+str(line))
setdef('12808|08.12.2008|13:44:35|-0.05||||||||0.26|1.53|2.94|0.81|1.75|5.53|79.56||||2|K:\Path\to\File\TE08-08-Chla-12.08.2008.xls|19.01.2009 09:34:57|9|15||')

The change in the first If clause, titled #for robustness, didn't seem to change the behavior. In the second If clause I tried this replace too, but it seems not to solve the the problem. On the other hand your suggestion to just use Pythons replace without re.sub worked just fine and seemed to solved the problem, so thank you for saving my day! — thejuggler80, Aug 21 '20 at 07:11

Problem using re.sub with a replacement including Vertical lines

1 Answers1