1

I'm using EmEditor to manually split some large files (3GB+). I just spent an hour doing so before realizing that thew new files had only Carriage Returns where the old files had Line Feeds and Carriage Returns.

These are HL7 files, so that was kind of important...

How can I maintain special characters on copy/paste?

1 Answers1

1

If you are losing your LF characters I am going on the assumption this is a file generated by a Windows system and it has been brought over and modified/manipulated in a Unix based system. I have run in to this issue between these two platforms as Windows reads a new line as CRLF and Unix only looks for the CR, as you indicated above.

I have had success in automating file move/manipulation processes between the two platforms, and with this CRLF issue specifically, using a perl script that adds the LF on at the end of every line. The conversion can go both ways, and a good write up on how to leverage perl to do this (as well as other methods to solve this exact problem) is located here: https://kb.iu.edu/d/acux

Specifically, you can download a perl install from www.perl.org (it's free) and then run the following code, calling out that this script should run in perl specifically:

perl -p -e 's/\n/\r\n/' < unixfile.txt > winfile.txt

In the context of a Windows system, I create a .bat file with the above code, create a windows task to automate the .bat, and set the appropriate "Start In" directory to be where the file conversions are going to take place. According to the code above, I would read in any file called unixfile.txt, add in the CRLF characters to every line that contains a CR, and spit out a new file called winfile.txt which is properly formatted with CRLF on each line.

If you are struggling with this still or if you are having trouble with any part of what I suggested feel free to let me know. I have done a few file conversions where I am the Windows system receiving the Unix file and I've had success automating the conversion and delivering the file, so I hope you find this helpful!

Maiza
  • 44
  • 4
  • 2
    Hl7 distinguishes between carriage returns and line feeds. The first separates segments, the latter denotes the message end in MLLP. So converting all of one sort to the other or adding additional LF falsifies the message(s) and makes them unusable. – sqlab Nov 23 '16 at 15:02
  • @sqlab It definitely adds in data, but to say it is unusable is not correct. You stated that HL7 distinguishes between carriage returns and line feeds, which is true. Getting a file with just a carriage return will cause most interface engines to no process correctly. This definitely "falsifies" data by adding in a Line Feed and making the line digestible by your HL7 integration engine, but I fail to see how this makes it unusable. Do you have a better proposal to solve this? – Maiza Nov 29 '16 at 23:56
  • 1
    For what it's worth they were actually generated on a ?nix box. I downloaded them from there via FTP to my windows machine. –  Dec 01 '16 at 15:38
  • @Scott Have you tried the above solution? That worked for me in a Unix -> Windows scenario like you described. I was pulling from FTP as well. – Maiza Dec 01 '16 at 22:49
  • 1
    I only occasionally have to do this for audits and they are always HUGE files with months of data. Next one comes around, I'll see if this solution works/is feasible. Thanks. –  Dec 01 '16 at 22:50
  • 1
    @Maiza, transfering files with binary mode of ftp keeps the correct segment delimiter and message endings – sqlab Dec 02 '16 at 14:21