I'm writing a script that anonymizes participant data from a file.
Basically, I have:
- A folder of plaintext participant data (sometimes CSV, sometimes XML, sometimes TXT)
- A file of known usernames and accompanying anonymous IDs (e.g. jsmith1 as a known username, User123 as an anonymous ID)
I want to replace every instance of the known username with the corresponding anonymous ID.
Generally speaking, what I have works just fine -- it loads in the usernames and anonymous IDs into a dictionary and one by one runs a find-and-replace on the document text for each.
However, this script also strips out names, and it runs into some difficulty when it encounters names contained in other names. So, for example, I have two pairs:
John,User123 Johnny,User456
Now, when I run the find-and-replace, it may first encounter John, and as a result it replaces Johnny with User123ny, and then doesn't trigger Johnny.
The simplest solution I can think of is just to run the find-and-replace from longest key to shortest. To do that, it looks like I need a SortedDictionary.
However, I can't seem to convince Visual Basic to take my custom Comparer for this. How do you specify this? What I have is:
Sub Main()
Dim nameDict As New SortedDictionary(Of String, String)(AddressOf SortKeyByLength)
End Sub
Public Function SortKeyByLength(key1 As String, key2 As String) As Integer
If key1.Length > key2.Length Then
Return 1
ElseIf key1.Length < key2.Length Then
Return -1
Else
Return 0
End If
End Function
(The full details above are in case anyone has any better ideas for how to resolve this problem in general.)
John" or "John
" in plaintext, or ",John" or "John," in CSV files. – David Oct 12 '15 at 19:26