12

I am using this to split strings:

 let split = Str.split (Str.regexp_string " ") in
   let tokens = split instr in
 ....

But the problem is that for example here is a sentence I want to parse:

pop     esi

and after the split it turns to be (I use a helper function to print each item in the tokens list):

item: popitem: item: item: item: esi

See, there are three spaces in the token list.

I am wondering if there is a string.split like in Python which can parse instr this way:

item: popitem: esi

Is it possible?

Matthias Braun
  • 32,039
  • 22
  • 142
  • 171
lllllllllllll
  • 8,519
  • 9
  • 45
  • 80

4 Answers4

21

Don't use Str.regexp_string, it's only for matching fixed strings.

Use Str.split (Str.regexp " +")

Jeffrey Scofield
  • 65,646
  • 2
  • 72
  • 108
8

Since OCaml 4.04.0 there is also String.split_on_char, which you can combine with List.filter to remove empty strings:

# "pop     esi"
  |> String.split_on_char ' '
  |> List.filter (fun s -> s <> "");;
- : string list = ["pop"; "esi"]

No external libraries required.

glennsl
  • 28,186
  • 12
  • 57
  • 75
7

Using Jane Street's Core library, you can do:

let python_split x =
  String.split_on_chars ~on:[ ' ' ; '\t' ; '\n' ; '\r' ] x
  |> List.filter ~f:(fun x -> x <> "")
;;
Anthony Scemama
  • 1,563
  • 12
  • 19
1

This is how I split my lines into words:

open Core.Std
let tokenize line = String.split line ~on: ' ' |> List.dedup

Mind the single quotes around the space character.

Here's the documentation for String.split: link

Matthias Braun
  • 32,039
  • 22
  • 142
  • 171