I've been working at this for a few days now, and it seems no where has the answer I need.
In fear of this being marked as duplicate, I'll explain why the other questions don't work for me.
Any answer with DIFFLIB for Python will not help my needs. (I describe more below). It is entirely too slow- unless someone has a good optimization tip for me (unified_diff module) I won't be able to use it.
I've tried researching how to send large strings to commands that expect files, but none of the options worked for me. I wouldn't mind using this option if I could get it to work (also described more below).
I don't mind being marked as duplicate so long as it is a question that genuinely solves my problem- and I've scraped a few sites and haven't found a solution that works for me yet.
I want to merge two large strings in Python. The strings are about 1.5KB each. Assuming there are two strings, str1 and str2, I just want to return the merged string which is simply str1 with the added information of str2. I don't want anything to be removed.
For the most part, these strings will be relatively the same. Most times, it will be 90% the same. The difference is that there may be new information added to the second string, and I would like to capture that information into the original one.
ergo.
str1 = "This is a very
Long string and
This is how it looks."
str2 = "This is a very
This is my Example
This is how it looks."
result = "This is a very
Long string and
This is my Example
This is how it looks." #Third line was added to str1
The very first way that I solved this problem is using git diff. I'm on Windows, and what I would do is execute a git diff cmd with temporary files that I outputted the string into, then delete the files immediately after. The cmd function I made would return the output (a unified diff) as a string. I would then post process on the string to remove the header that diff's always add. I was able to remove the '+' and '-' on each line by changing the output indicators to spaces (I all the options I used from my code for simplicity.
#The f1and f2 text files are created here
#cmd is a function created by me, and it uses the os module to execute the command
output = cmd("git diff -U999999 -b --no-index f1.txt f2.txt")
#f1 and f2 text files are deleted here
I've tried DiffLib, but that was entirely too slow. It took about 8-10 minutes to do one diff file output. I used the unified_diff module and I passed the arguments as strings, and as lists. I even tried to manipulate the source code but my changes didn't make it much faster.
I've also tried passing the strings directly to git diff or just diff. There would be errors, however, complaining "Argument List too Long". I even tried sending the string to stdout and using that as a file argument and that didn't work much at all either.
I don't mind using any of these options if it can tweaked to work for my goal. Clearly, my current solution (the block of code above) is very inefficient and I don't want to keep creating and deleting text files if it can be avoided.