0

I spent several hours looking at different posts and testing different solutions but none worked. I tried sed, powershell and heard of tr but could not configure/install it correctly.

I receive hundreds of text files to a windows server.

The files with several million records contain lines like these:

123xxxxxNULyyyyaaabbbCRLF
124xxxxxNULzzzzCRLFxxxxxCRLF

Now, I need to load this data to redshift which could not handle NUL and sees CRLF (the one in the middle) as end of line and errors out of missing data.

So, I need to remove NUL global from the file then remove CRLF if it is not at end of line

123xxxxyyyyaaabbbCRLF
124xxxxxzzzzxxxxCRLF

I have seen some posts on removing linefeed with sed but even the syntax would not work on my windows10.

Thanks in advance

I can edit some files in notepad+++ but this wont work on large files and hundreds of them.

I tried some sed I found on a post but errored out.

C:\data>sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' x.txt
sed: -e expression #1, char 1: unknown command: `''

C:\data>sed -e ':a;N;$!ba;s/\n/ /g' x.txt
sed: -e expression #1, char 1: unknown command: `''

C:\data>sed -e :a;N;$!ba;s/\n/ /g x.txt
sed: -e expression #1, char 15: unterminated `s' command

I tried some sed I found on a post but errored out.

C:\data>sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' x.txt
sed: -e expression #1, char 1: unknown command: `''

C:\data>sed -e ':a;N;$!ba;s/\n/ /g' x.txt
sed: -e expression #1, char 1: unknown command: `''

C:\data>sed -e :a;N;$!ba;s/\n/ /g x.txt
sed: -e expression #1, char 15: unterminated `s' command

a simple sed or powershell or else solution that the output file looks like:

123xxxxyyyyaaabbbCRLF
124xxxxxzzzzxxxxCRLF
James K. Lowden
  • 7,574
  • 1
  • 16
  • 31
dan1197
  • 13
  • 1
  • 1
  • 4
  • What do you mean by "in the middle"? What is the record delimiter if not CRLF? FWIW, it's difficult to use sed to remove linefeeds because standard sed will not match `\n`; the newline must be literal. The GNU utilities have been ported to Windows. The tr(1) command may be helpful. – James K. Lowden Jun 02 '19 at 16:34
  • Thanks much. one of the column is a description which is hand free typing and sometimes it contains CRLF as well. if tr is the answer, where can I find clear instructions on download and configuration. Even for a start if I can remove NUL would be good. Thanks – dan1197 Jun 02 '19 at 17:10
  • I found the answer on how to replace NUL from the file. So, this piece is covered. However, Redshift copy should have handled that. So, I am moving that piece to Redshift question. The other part, on how to remove CRLF is it is not end of line in windows still outstanding? I found a link here on how to install tr but cant get it to work for me. Thanks – dan1197 Jun 02 '19 at 19:26
  • In *nix land, `LF` **is** the end of the line, once a *nix utiltity finds that, it assumes it is done. Is your data fixed length records? Then maybe you can make it work. Good luck. – shellter Jun 03 '19 at 01:11
  • `tr` will replace all `LF`s on a line, so then you lose its ability to separate a record. Only a `sed` maniac would even try to solve this problem with `sed`. Using `awk` you will have half a chance, but you'll need to give us a better real-data working example to be practical (only 3-4 fields, 3-4 lines should be enough). Good luck. – shellter Jun 03 '19 at 17:52

0 Answers0