2

I was playing with algorithms using Dart and as I actually followed TDD, I realized that my code has some limitations.

I was trying to reverse strings as part of an interview problem, but I couldn't get the surrogate pairs correctly reversed.

const simple = 'abc';
const emoji = '';
const surrogate = '‍♂️‍';

String rev(String s) {
    return String.fromCharCodes(s.runes.toList().reversed);
}

void main() {
    print(simple);
    print(rev(simple));
    print(emoji);
    print(rev(emoji));
    print(surrogate);
    print(rev(surrogate));
}

The output:

abc
cba


‍♂️‍
‍️♂‍

You can see that the simple emojis are correctly reversed as I'm using the runes instead of just simply executing s.split('').toList().reversed.join(''); but the surrogate pairs are reversed incorrectly.

How can I reverse a string that might contain surrogate pairs using the Dart programming language?

Vince Varga
  • 6,101
  • 6
  • 43
  • 60

3 Answers3

2

When reversing strings, you must operate on graphemes, not characters nor code units. Use grapheme_splitter.

daxim
  • 39,270
  • 4
  • 65
  • 132
  • Oh, I really hoped it would be somewhat simpler than this. The source code is around 2300 LOC... – Vince Varga Oct 13 '19 at 14:31
  • Unfortunately, it looks like there's no way around this. FYI, also asked about it on GitHub https://github.com/dart-lang/sdk/issues/38854 – Vince Varga Oct 14 '19 at 14:24
  • 1
    There is no easy way to finding grapheme cluster boundaries. You need to know which category, out of 15, each code point belongs to, and there are 0x10FFFF code points. Then you have to, effectively, simulate a finite state automaton over those categories, so you can correctly break between waving flag and rainbow, but not between waving flag, zero-width-joiner and rainbow. There is a right way to do this, and many wrong ways, but no an *easy* and right ways. – lrn Oct 18 '19 at 09:26
0

Dart 2.7 introduced a new package that supports grapheme cluster-aware operations. The package is called characters. characters is a package for characters represented as Unicode extended grapheme clusters.

Dart’s standard String class uses the UTF-16 encoding. This is a common choice in programming languages, especially those that offer support for running both natively on devices, and on the web.

UTF-16 strings usually work well, and the encoding is transparent to the developer. However, when manipulating strings, and especially when manipulating strings entered by users, you may experience a difference between what the user perceives as a character, and what is encoded as a code unit in UTF-16.

Source: "Announcing Dart 2.7: A safer, more expressive Dart" by Michael Thomsen, section "Safe substring handling"

The package will also help to reverse your strings with emojis the way a native programmer would expect.

Using simple Strings, you find issues:

String hi = 'Hi ';
print('String.length: ${hi.length}');
// Prints 7; would expect 4

With characters

String hi = 'Hi ';
print(hi.characters.length);
// Prints 4
print(hi.characters.last);
// Prints 

It's worth taking a look at the source code of the characters package, it's far from simple but looks easier to digest and better documented than grapheme_splitter. The characters package is also maintained by the Dart team.

Vince Varga
  • 6,101
  • 6
  • 43
  • 60
0

Create an extension on String named reversed

extension on String {
  /// Reverse the string

  String get reversed =>
      GraphemeSplitter().splitGraphemes(this).toList().reversed.join();
}

In order to add GraphemeSplitter class install grapheme_splitter package :

dart pub add grapheme_splitter

Example Program:

import "package:grapheme_splitter/grapheme_splitter.dart";
import "dart:io";

void main(final List<String> $) async {
  test();
}

void test() async {
  final Writer writer = Writer();

  const simple = 'abc';

  const emoji = '';

  const surrogate = '‍♂️‍';

  const hell = "Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘";

  const nightmare = "️‍";

  await writer.print(simple);
  await writer.print(emoji);
  await writer.print(surrogate);
  await writer.print(hell);
  await writer.print(nightmare);

  await writer.print(simple.reversed);
  await writer.print(emoji.reversed);
  await writer.print(surrogate.reversed);
  await writer.print(hell.reversed);
  await writer.print(nightmare.reversed);
}

class Writer {
  final String filePath;
  final File file;

  Writer({this.filePath = "./data.dat"})
      : file = File(filePath)
          ..writeAsString(
              ""); // If File exits lets truncate it

  print(final Object data) async {
    await file.writeAsString("${data.toString()}\n",
        mode: FileMode.append); // Appends to the above file
  }
}

extension on String {
  /// Reverse the string

  String get reversed =>
      GraphemeSplitter().splitGraphemes(this).toList().reversed.join();
}

Output in data.dat file

abc

‍♂️‍
Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘
️‍
cba

‍‍♂️
Ǫ̵̹̻̝̳͂̌̌͘G̴̻͈͍͔̹̑͗̎̅͛́L̠ͨͧͩ͘A̴̵̜̰͔ͫ͗͢Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍
️‍

The output is displayed in a file instead of terminal is because most of the terminal will not render these characters properly.

Udesh
  • 2,415
  • 2
  • 22
  • 32