I'm trying to populate a new EMR with data from an existing environment. I am pulling a log of all activity for a given interface and feeding it in to the inbound channel in the new environment. The problem is our existing channel has duplicates of the messages which will create duplicate reports in the patient records.
Beyond looking through what feels like the entire internet I've tried pushing text around in Iguana, PowerShell and Excel and I'm not familiar enough with MirthConnect to make use of it. I'm not married to any one solution, I just need a solution and PDQ.
I found a fairly good starting point at https://www.secretgeek.net/ps_duplicates and I've been massaging it but still no complete solution. At this point I've basically reset it to zero because nothing I've done has improved it (mostly I broke it repeatedly).
$hash = @{} #Define an empty hashtable
gc "c:\Samples\Q12019.txt" | #Send the content of the file into the pipeline...
% {
if ($hash.$_ -eq $null) { #if that line isn't a key in the hash table
# $_ is data from the pipe
$_ #send the data down the pipe
};
$hash.$_ = 1 #add that line to the hash so it doesn't resend
} > "c:\Samples\RadHx Test Q12019.txt"
This does some trippy stuff I don't understand. It ingests the file and the output has a new space B E T W E E N every single character in the file. I can't even tell if it's removing duplicates and I haven't been able to get it to stop doing this. I'm also not sure it's reading an entire message including all of it's segments. Example 2 at https://healthstandards.com/blog/2007/09/10/variations-of-the-hl7-orur01-message-format/ looks close enough to what I'm dealing with as an example of ingest, just add 2000 more in a text file.
Simplified explanation: I have a text file with several blocks of related text. Each block has the same starting sequence of characters, say 'ABC'. The blocks have an arbitrary length and don't necessarily end with the same string but all blocks end with CRLF. Problem: Each block may not be unique but I need to eliminate repeating blocks of text so the file only contains one instance of each block of text.