1

I have lines of data that includes two dates. I want to change the format of the first date from mm/dd/yy to 20yy-mm-dd.
Since I want to change only the first date, I use perl instead of sed, because sed doesn't have lazy matching.

The following gives the wrong results

echo test,10/02/20,test2,11/03/20,test3 | perl -pe 's/(.*?)(..)\/(..)\/(..)(.*)/\120\4-\2-\3\5/'
# P20-10-02,test2,11/03/20,test3

If I add a space after \1 it works fine, but I don't want that extra space in the output:

echo test,10/02/20,test2,11/03/20,test3 | perl -pe 's/(.*?)(..)\/(..)\/(..)(.*)/\1 20\4-\2-\3\5/'
# test, 2020-10-02,test2,11/03/20,test3

The problem seems to be that it reads \120 not as \1 + 20 but as reference to group 120 (which doesn't exist).

anubhava
  • 761,203
  • 64
  • 569
  • 643
LoMaPh
  • 1,476
  • 2
  • 20
  • 33
  • 1
    It's been awhile since Perl wanted you to use the back reference notation (`\1`) in the replacement side. You wouldn't have received the `\1 better written as $1` warning because the `\120` is considered an octal character instead (capital "P") (see [toke.c](https://github.com/Perl/perl5/blob/e935a39c197c3d6d97be7fa7623c451224180b68/toke.c#L3563)). Since the replacement side is a double quoted context, the double quote stuff happens first. The `\120` is already 'P' before Perl fills in the captures. – brian d foy Nov 02 '20 at 21:56
  • 2
    @RyszardCzech "backreference" was not a term I knew. So when I searched for an answer to my question I didn't find the answer you mentioned. – LoMaPh Mar 09 '21 at 05:38
  • 1
    Moreover final regex in answer below didn't even need any technique like `${1}` i.e. `perl -pe 's~([0-9]{2})/([0-9]{2})/([0-9]{2})~20$3-$1-$2~' <<< "$s"` – anubhava Mar 09 '21 at 11:18

1 Answers1

5

You may use this optimized simpler regex for your case as you don't need to use more than 3 capture groups to avoid any digit after back-reference in substition:

perl -pe 's~(\d{2})/(\d{2})/(\d{2})~20$3-$1-$2~' <<< "$s"

Your original regex can be solved using:

s='test,10/02/20,test2,11/03/20,test3'
perl -pe 's~(.*?)(..)/(..)/(..)(.*)~${1}20$4-$2-$3$5~' <<< "$s"
test,2020-10-02,test2,11/03/20,test3

Mark your back-reference as ${n} instead of $n or \n when using digits next to them to mark your back-reference appropriately.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Bit of further tuning. You may replace `(..)` with `(\d{2})` – anubhava Nov 02 '20 at 07:31
  • 1
    Or even `([0-9]{2})`. `\d` matches a whole bunch of characters you wouldn't expect to see in numeric dates in the Gregorian calendar. – tobyink Nov 02 '20 at 10:17
  • 1
    Leading `(.*?)` and trailing `(.*)` are redundant. They do not do anything useful and only complicates the code. – TLP Nov 02 '20 at 16:48
  • 1
    That's right @TLP It can be done using `perl -pe 's~([0-9]{2})/([0-9]{2})/([0-9]{2})~20$3-$1-$2~' <<< "$s"`. I didn't attempt to optimize OP's regex but have added it now in answer. – anubhava Nov 02 '20 at 16:53