1

How can I pass a string and convert it to a list of words in sml?

For example: "one two three" to ["one", "two", "three"]

sshine
  • 15,635
  • 1
  • 41
  • 66
  • You can split the string using a space, " ". This example is very similar but will need a little bit of tweeking. https://stackoverflow.com/questions/43289155/sml-splitting-string-on-first-space – Michael Mar 15 '18 at 16:45

1 Answers1

1

You can (and probably should) use String.tokens:

- String.tokens Char.isSpace "one two three";
> val it = ["one", "two", "three"] : string list

There is also String.fields. They differ in how they treat consecutive/superfluous separators:

- String.tokens Char.isSpace "  one  two  three  ";
> val it = ["one", "two", "three"] : string list
- String.fields Char.isSpace "  one  two  three  ";
> val it = ["", "", "one", "", "two", "", "three", "", ""] : string list

If your string has multiple potential separators and you're just interested in the words:

fun isWordSep c = Char.isSpace c orelse
                ( Char.isPunct c andalso c <> #"-" andalso c <> #"'" )
val words = String.tokens isWordSep

This works for one definition of what a word is:

- words "I'm jolly-good.  Are you?";
> val it = ["I'm", "jolly-good", "Are", "you"] : string list

Not all natural language will abide by this definition, e.g. e.g. being an acronym rather than two words, e and g. For any accuracy you're headed into Natural Language Processing land.

sshine
  • 15,635
  • 1
  • 41
  • 66