I have a program which counts lines of code (excluding comments, braces, whitespace, etc.) of two programs then compares them. It puts all the lines from one program in one List and the lines from the other program in another List. It then removes all lines that are identical between the two. One List is then all the lines added to program 1 to get program 2 and the other List is all the lines removed from program 1 to get program 2.
Now I need a way to detect how many lines of code from program 1 have been MODIFIED to get program 2. I found an algorithm for the Levenshtein Distance, and it seems like that will work. I just need to compare the distance with the length of the strings to get a percentage changed, and I'll need to come up with a good value for the threshold.
However my problem is this: how do I know which two strings to compare for the Levenshtein Distance? My best guess is to have a nested for loop and loop through one program once for every line in the other program to compare every line with every other line looking for a Distance that meets my difference threshold. However, that seems very inefficient. Are there any other ways of doing this?
I should add this is for a software engineering class. It's technically homework, but we're allowed to use any resource we need. While I'm just looking for an algorithm, I'll let you know I'm using C#.