Is there a library or code snippet available that can take two strings and return the exact or approximate mid-point string between the two strings?
Preferably the code would be in Python.
Background:
This seems like a simple problem on the surface, but I'm kind of struggling with it:
- Clearly, the midpoint string between "A" and "C" would be "B".
- With base64 encoding, the midpoint string between "A" and "B" would probably be "Ag"
- With UTF-8 encoding, I'm not sure what the valid midpoint would be because the middle character seems to be a control character:
U+0088 c2 88 <control>
Practical Application:
The reason I am asking is because I was hoping write map-reduce type algorithm to read all of the entries out of our database and process them. The primary keys in the database are UTF-8 encoded strings with random distributions of characters. The database we are using is Cassandra.
Was hoping to get the lowest key and the highest key out of the database, then break that up into two ranges by finding the midpoint, then breaking those two ranges up into two smaller sections by finding each of their midpoints until I had a few thousand sections, then I could read each section asynchronously.
Example if the strings were base-16 encoded: (Some of the midpoints are approximate):
Starting highest and lowest keys: '000' 'FFF' / \ / \ '000' '8' '8' 'FFF' / \ / \ / \ / \ Result: '000' '4' '4' '8' '8' 'B8' 'B8' 'FFF' (After 3 levels of recursion)