I am working on a project that compares two large text file versions (around 5000+ lines of text). The newer version contains potentially new and removed content. It is intended to help detect early changes in text versions as a team receives information from that text.
To solve the problem, I use the diff-match-patch libary, which allows me to identify already removed and new content. In the first step I search for changes.
public void compareStrings(String oldText, String newText){
DiffMatchPatch dmp = new DiffMatchPatch();
LinkedList<Diff> diffs = dmp.diffMain(previousString, newString, false);
}
Then I filter the list by the keywords INSERT/DELETE to get only the new/removed content.
public String showAddedElements(){
String insertions = "";
for(Diff elem: diffs){
if(elem.operation == Operation.INSERT){
insertions = insertions + elem.text + System.lineSeparator();
}
}
return insertions;
}
However, when I output the contents, I sometimes get only single letters, like (o, contr, ler), when only single characters were removed/added. Instead, I would like to output the whole sentence in which a change occured. Is there a way to also retrieve the line number from the DiffMatchPatch where the changes occured?