1

I'm trying to sort Hungarian words in the dictionary by alphabetical order. The expected order for all letters should be aábcdeéfggyhiíjklmnoóöőpqrsttyuúüűvwxyz

I was trying to use Intl.Collator() and localeCompare but the expected output was never right.

for example:

console.log(["baj", 'betűz', 'ä', "bácsi"].sort(new Intl.Collator('hu').compare));
//expected output ["ä", "baj", "bácsi", "betűz"]

what I got is Array ["ä", "bácsi", "baj", "betűz"]

á comes before a but should be after a

and it happened for é and í also.

I was trying to use

.sort(function(a, b) {
  let letterA = a.toUpperCase();
  let letterB = b.toUpperCase();
  if (letterA < letterB) {
    return -1;
  }
  if (letterA > letterB) {
    return 1;
  }
  return 0;
});

but words with specials signs were put at the end of the array which is not what I want.

Any suggestions on how can I resolve that issue?

  • Have you looked into https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Collator? – PM 77-1 Jan 21 '21 at 17:16
  • Of course, I tried everything from there, every option and language almost, and nothing seems to work... – Daniel Wiśniewski Jan 21 '21 at 17:20
  • This simply demonstrates "Natural Sort Order" _of the **entire** string_. As soon as the sort comparator hits the third character it puts `baj` behind `bácsi` because the `j` has a higher code point than `c` and the `a` and `á` have the same unicode _base_. – Randy Casburn Jan 21 '21 at 18:47

2 Answers2

0

You could sort by hand and get the wanted order with a given alphabet.

const
    alphabet = 'aábcdeéfggyhiíjklmnoóöőpqrsttyuúüűvwxyz',
    order = Object.fromEntries([].map((c, i) => [c, i + 1])),
    compare = (a, b) => {
        let i = 0,
            l = Math.min(a.length, b.length),
            r = 0;
            
        while (!r && i < l) {
            r = a[i] in order && b[i] in order ? order[a[i]] - order[b[i]] : a[i].localeCompare(b[i]);
            i++;
        }
        return r || a.length - b.length;
    }

console.log(...["baj", 'betűz', 'ä', "bácsi"].sort(compare)); // ["ä", "baj", "bácsi", "betűz"]
Nina Scholz
  • 376,160
  • 25
  • 347
  • 392
  • Your algorithm is flawed in that it bails on the first non-zero match, thereby not truly sorting on the entire string (natural sort order). Change the `a` in `baj` to `á` and you will see the sort order changes to the the third character, and changes the order because 'c' < 'j'. I believe the OP would find all sorts of ways for you to modify the algorithm to make it work for custom output orders. Put this is a test with variable length strings to compare. – Randy Casburn Jan 21 '21 at 18:44
  • @RandyCasburn, what is wrong with `'bácsi' < 'báj'`? – Nina Scholz Jan 21 '21 at 18:51
  • Nothing or everything based upon an unspecified locale and without regard to the second code point value. The char `ä` specified by the OP isn't even in the `alphabet` provided. At any rate `ä`, `a`, and `á` all have the same base code point. The comparator the OP used works correctly precisely because of the specified `hu` as the locale. – Randy Casburn Jan 21 '21 at 19:00
0

After some time and with the great help of my brother we came up with a solution and it can be basically any alphabet

const wordList = [
  { id: 1, word_hu: 'búcsúajándék' },
  { id: 2, word_hu: 'Bőrönd' },
  { id: 3, word_hu: 'betűz' },
  { id: 4, word_hu: 'bácsi' },
  { id: 5, word_hu: 'bejelöl' },
  { id: 10, word_hu: 'áfjklsdfjk' },
  { id: 18, word_hu: 'aáfjklsdffvk' },
  { id: 11, word_hu: 'azjklsdfjk' },
  { id: 21, word_hu: 'ahjklsdfjk' },
  { id: 6, word_hu: 'büfé' },
  { id: 7, word_hu: 'búcsúajándék' },
  { id: 8, word_hu: 'ceruza' },
  { id: 9, word_hu: 'baj' },
];

 const alphabetIndex = {
  a: 1,
  á: 2,
  b: 3,
  c: 4,
  d: 5,
  e: 6,
  é: 7,
  f: 8,
  g: 9,
  h: 10,
  i: 11,
  í: 12,
  j: 13,
  k: 14,
  l: 15,
  m: 16,
  n: 17,
  o: 18,
  ó: 19,
  ö: 20,
  ő: 21,
  p: 22,
  q: 23,
  r: 24,
  s: 25,
  t: 26,
  u: 27,
  ú: 28,
  ü: 29,
  ű: 30,
  v: 31,
  w: 32,
  x: 33,
  y: 34,
  z: 35,
};

const getWordsPair = (aWord, bWord) => {
  const aWordArray = aWord.toLowerCase().replace(" ", "").split('');
  const bWordArray = bWord.toLowerCase().replace(" ", "").split('');

  let shouldReturn = false;

  return aWordArray.reduce(
    (acc, aWordletter, aWordIndex) => {
      if (shouldReturn) {
        return acc;
      }

      const bWordLetter = bWordArray[aWordIndex];
      const aWordLetterNumber = alphabetIndex[aWordletter];
      const bWordLetterNumber = alphabetIndex[bWordLetter];

      acc[0].push(aWordLetterNumber);
      acc[1].push(bWordLetterNumber);

      shouldReturn = aWordLetterNumber !== bWordLetterNumber;

      return acc;
    },
    [[], []]
  );
}

const sortWords = (list) => {
  return list.sort((aWordObject, bWordObject) => {
    const aWord = aWordObject.word_hu;
    const bWord = bWordObject.word_hu;

    const wordsPair = getWordsPair(aWord, bWord);

    const mappedAWord = wordsPair[0].join('');
    const mappedBWord = wordsPair[1].join('');

    return mappedAWord - mappedBWord;
  });
};

const sortedList = sortWords(wordList);