How can I pass a string and convert it to a list of words in sml?
For example: "one two three"
to ["one", "two", "three"]
How can I pass a string and convert it to a list of words in sml?
For example: "one two three"
to ["one", "two", "three"]
You can (and probably should) use String.tokens
:
- String.tokens Char.isSpace "one two three";
> val it = ["one", "two", "three"] : string list
There is also String.fields
. They differ in how they treat consecutive/superfluous separators:
- String.tokens Char.isSpace " one two three ";
> val it = ["one", "two", "three"] : string list
- String.fields Char.isSpace " one two three ";
> val it = ["", "", "one", "", "two", "", "three", "", ""] : string list
If your string has multiple potential separators and you're just interested in the words:
fun isWordSep c = Char.isSpace c orelse
( Char.isPunct c andalso c <> #"-" andalso c <> #"'" )
val words = String.tokens isWordSep
This works for one definition of what a word is:
- words "I'm jolly-good. Are you?";
> val it = ["I'm", "jolly-good", "Are", "you"] : string list
Not all natural language will abide by this definition, e.g. e.g. being an acronym rather than two words, e and g. For any accuracy you're headed into Natural Language Processing land.