I'm going to go a bit in-depth with my problem, you can jump to the TL;DR if you don't want to read all of this
What I'm trying to do
I need to store a "file" (text document) which can be user-edited. If I have my original file (which could be huge)
Lorem ipsum dolor sit amet
and the user were to make a change:
Foo ipsum amet_ sit
Basically, I have the original string and the user-edited string. I want to find the differences, "edits". To prevent storing duplicates of very large strings. I want to store the original and the "edits". Then apply the edits to the original. Kind of like data de-duplication. The problem is that I have no idea how different edits can be and I also need to be able to apply those edits to the string.
Attempts
Because the text could be huge, I am wondering what would be the most "efficient" way to store edits to the text without storing two separate versions. My first guess was something along the lines of:
var str = 'Original String of text...'.split(' ') || [],
mod = 'Modified String of text...'.split(' ') || [], i, edits = [];
for (i = 0; i < str.length; i += 1) {
edits.push(str[i]===mod[i] ? undefined : mod[i]);
}
console.log(edits); // ["Modified", null, null, null] (desired output)
then to revert back:
for (i = 0; i < str.length; i += 1) {
str[i] = edits[i] || str[i];
}
str.join(' '); // "Modified String of text..."
Basically, I'm trying to split the text by spaces into arrays. Compare the arrays and store the differences. Then apply the differences to generate the modified version
Problems
But if the amount of spaces were to change, problems would occur:
str
: Original String of text...
mod
: OriginalString of text...
Output: OriginalString of text... text...
My desired output: OriginalString of text...
Even if I were to switch str.length
with mod.length
and edits.length
like:
// Get edits
var str = 'Original String of text...'.split(' ') || [],
mod = 'Modified String of text...'.split(' ') || [], i, edits = [];
for (i = 0; i < mod.length; i += 1) {
edits.push(str[i]===mod[i] ? undefined : mod[i]);
}
// Apply edits
var final = [];
for (i = 0; i < edits.length; i += 1) {
final[i] = edits[i] || str[i];
}
final = final.join(' ');
edits
would be: ["ModifiedString", "of", "text..."]
in result making the whole 'storing edits thing useless. And even worse if a word were to be added / removed. If str
were to become Original String of lots of text...
. The output would still be the same.
I can see that they're many flaws in the way I'm doing this, but I can't think of any other way.
Snippet:
document.getElementById('go').onclick = function() {
var str = document.getElementById('a').value.split(' ') || [],
mod = document.getElementById('b').value.split(' ') || [],
i, edits = [];
for (i = 0; i < mod.length; i += 1) {
edits.push(str[i] === mod[i] ? undefined : mod[i]);
}
// Apply edits
var final = [];
for (i = 0; i < edits.length; i += 1) {
final[i] = edits[i] || str[i];
}
final = final.join(' ');
alert(final);
};
document.getElementById('go2').onclick = function() {
var str = document.getElementById('a').value.split(' ') || [],
mod = document.getElementById('b').value.split(' ') || [],
i, edits = [];
for (i = 0; i < str.length; i += 1) {
edits.push(str[i] === mod[i] ? undefined : mod[i]);
}
for (i = 0; i < str.length; i += 1) {
str[i] = edits[i] || str[i];
}
alert(str.join(' ')); // "Modified String of text..."
};
Base String:
<input id="a">
<br/>Modified String:
<input id="b" />
<br/>
<button id="go">Second method</button>
<button id="go2">First Method</button>
TL;DR:
How would you find the changes between two strings?
I'm dealing with large pieces of text each could be about a megabyte hundred kilobytes. This is running on the browser