3

I have a hashMap(guava bimap) in which keys and values both are strings, I wanted to write a program which parses the given file and replaces all the strings in the file which are also in BiMap with corresponding values from Bimap.

for example: i have a file called test.txt has following text Java is a set of several computer software and specifications developed by Sun Microsystems.

and my BiMap has "java i" => "value1" "everal computer" => "value2" etc..

So now i want my program to take test.txt and Bimap as input and give an output which looks something like this

value1s a set of svalue2 software and specifications developed by Sun Microsystems.

please point me towards any algorithm which can do this, the program takes large files as input so brute force may not be a good idea.

Edit: I'm using fixed length strings for keys and values. That example was just intended to show the operation. Thanks.

  • 3
    What would you want to happen if map is: `a->x`, `ab->y`, and input is `abc` ? how do you want to solve such collisions? which should be prefered? – amit Apr 22 '15 at 11:14
  • 1
    Also, is order important? Would `a->x` `x->z` with input `ax` output `xz` or `zz`? – Deltharis Apr 22 '15 at 11:42
  • 1
    Try to check org.apache.commons.lang3.text.StrSubstitutor source. It replaces all occurencies in String with values from a Map. – StanislavL Apr 22 '15 at 11:48
  • 3
    Instead of a HashMap, you want to use a Trie, AKA a Prefix Tree. – Christoffer Hammarström Apr 22 '15 at 12:00
  • @amit thanks for your reply, I'm using fixed length strings as keys and values so that wouldn't be a problem, sorry I should have mentioned that. deltharis as I'm using bimap it won't allow duplicates on either side (key side or value side). stanislavl and chris I'll look into that thanks :) – Manjunath Bhat Apr 24 '15 at 03:49

1 Answers1

1

For a batch operation like this, I would avoid putting a lot of data into the memory. Therefore I'd recommend you to write the new content into a new file. If the file in the end must be the exact same file, you can still replace one file by the other, at the end of the process. read, write and flush each new line separately, and you won't have any memory issues.

Marcus Biel
  • 430
  • 4
  • 15