5

I have a simple awk command that converts a date from MM/DD/YYYY to YYYY/MM/DD. However, the file I'm using has \r\n at the end of the lines, and sometimes the date is at the end of the line.

awk '
  BEGIN { FS = OFS = "|" }
  {
    split($27, date, /\//)
    $27 = date[3] "/" date[1] "/" date[2]

    print $0
  }
' file.txt

In this case, if the date is MM/DD/YYYY\r\n then I end up with this in the output:

YYYY
/MM/DD

What is the best way to get around this? Keep in mind, sometimes the input is simply \r\n in which case the output SHOULD be // but instead ends up as

/
/
richie
  • 91
  • 1
  • 6

2 Answers2

10

Given that the \r isn't always at the end of field $27, the simplest approach is to remove the \r from the entire line.

With GNU Awk or Mawk (one of which is typically the default awk on Linux platforms), you can simply define your input record separator, RS, accordingly:

awk -v RS='\r\n' ...

Or, if you want \r\n-terminated output lines too, set the output record separator, ORS, to the same value:

awk 'BEGIN { RS=ORS="\r\n"; ... 

Optional reading: an aside for BSD/macOS Awk users:

BSD/macOS awk doesn't support multi-character RS values (in line with the POSIX Awk spec: "If RS contains more than one character, the results are unspecified").

Therefore, a sub call inside the Awk script is necessary to trim the \r instance from the end of each input line:

awk '{ sub("\r$", ""); ... 

To also output \r\n-terminated lines, option -v ORS='\r\n' (or ORS="\r\n" inside the script's BEGIN block) will work fine, as with GNU Awk and Mawk.

mklement0
  • 382,024
  • 64
  • 607
  • 775
0

If you're on a system where \n by itself is the newline, you should remove the \r from the record. You could do it like:

$ awk '{sub(/\r/,"",$NF); ...}'
James Brown
  • 36,089
  • 7
  • 43
  • 59