0

I am using the following code form sorting:

letters = '세븐일레븐'
old = [('세븐', 8), ('븐', 2), ('일', 5), ('레', 4)]
new = sorted(old, key=lambda x: letters.index(x[0]))

For non-latin characters, the output is the same as the input:

[('세븐', 8), ('븐', 2), ('일', 5), ('레', 4)]

What I'm expecting is:

[('세븐', 8), ('일', 5), ('레', 4), ('븐', 2)]
Paul Sweatte
  • 24,148
  • 7
  • 127
  • 265
thomascrha
  • 193
  • 1
  • 1
  • 13

2 Answers2

1

There is no problem with the sorting. Notice that the letter '븐' appears twice in your letters string. Since index returns the first index of that letter, letters.index('븐') evaluates to 1, which gives it a high priority.

matanso
  • 1,284
  • 1
  • 10
  • 17
  • I think this would be a good addition to ShadowRanger's answer, instead of a standalone answer. But, well. I think it deserves a +1 anyway. – Debosmit Ray Mar 20 '16 at 20:50
1

Why do you expect '일' to sort before '븐'? '븐' is the second character in letters; index is going to return the first instance it finds.

If the goal is to treat specific sequences differently, you need to define letters as a list of the complete strings you care about, not a single flat str, e.g.:

letters = ['세븐', '일', '레', '븐']

Then the index call will treat '세븐' as separate from '븐', and you get the expected output ordering.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • what I need is for the two to be the same if you were to create a string from old list i.e. ''.join(new[0]) == letters. Also I can't create that list as an input as I do not know that the first two characters are a pair. My script determines the best combination of character sets determined by count(what that number represents). What I'm saying is that the input could [('세븐', 8), ('븐', 2), ('일레', 4)]. Or would it be possible the create the letters list you define using old and letters? – thomascrha Mar 15 '16 at 00:05
  • @frostware: Well, if you created a string from both, it would match; `''.join(ss for ss, _ in new) == ''.join(letters)`. If you have no way of knowing the first two characters are a pair, then how do you expect to distinguish `븐` as a substring in what turns out to be another subsequence vs `븐` on its own? You could perform heuristics or exhaustive exploration to try to create the `letters` `list` from `old` and a `letters` `str`, but it would be ugly, and the algorithm would be incredibly ugly. – ShadowRanger Mar 15 '16 at 01:00