190

Given an input string such as " word1 word2 word3 word4 ", what would be the best approach to split this as an array of strings in Go? Note that there can be any number of spaces or unicode-spacing characters between each word.

In Java I would just use someString.trim().split("\\s+").

(Note: possible duplicate Split string using regular expression in Go doesn't give any good quality answer. Please provide an actual example, not just a link to the regexp or strings packages reference.)

Community
  • 1
  • 1
ralfoide
  • 2,705
  • 4
  • 23
  • 21
  • If you ended up on this page. This is what you are looking for: strings.SplitN(s, sep string, n int) []string – sny Sep 03 '22 at 09:57

4 Answers4

381

The strings package has a Fields method.

someString := "one    two   three four "

words := strings.Fields(someString)

fmt.Println(words, len(words)) // [one two three four] 4

DEMO: http://play.golang.org/p/et97S90cIH

From the docs:

Fields splits the string s around each instance of one or more consecutive white space characters, as defined by unicode.IsSpace, returning a slice of substrings of s or an empty slice if s contains only white space.

blackgreen
  • 34,072
  • 23
  • 111
  • 129
I Hate Lazy
  • 47,415
  • 13
  • 86
  • 77
  • 5
    Unfortunately, `strings.Fields` doesn't ignore spaces in quoted parts. – chmike Dec 21 '18 at 13:26
  • 7
    @chmike True, but the moment quotes get involved, you're in the business of *decoding* or *parsing* some specific *encoding* or *format*. – mtraceur Jan 20 '20 at 21:45
  • 1
    @chmike you might need `shlex` for that https://godoc.org/github.com/google/shlex – akhy Jul 16 '20 at 16:39
12

If you're using tip: regexp.Split

func (re *Regexp) Split(s string, n int) []string

Split slices s into substrings separated by the expression and returns a slice of the substrings between those expression matches.

The slice returned by this method consists of all the substrings of s not contained in the slice returned by FindAllString. When called on an expression that contains no metacharacters, it is equivalent to strings.SplitN.

Example:

s := regexp.MustCompile("a*").Split("abaabaccadaaae", 5)
// s: ["", "b", "b", "c", "cadaaae"]

The count determines the number of substrings to return:

n > 0: at most n substrings; the last substring will be the unsplit remainder.
n == 0: the result is nil (zero substrings)
n < 0: all substrings
zzzz
  • 87,403
  • 16
  • 175
  • 139
  • 4
    this seems like an overkill – thwd Dec 06 '12 at 09:14
  • @Tom But it's still interesting even if it's not the best answer here. I upvoted this answer because I learned something. – Denys Séguret Dec 06 '12 at 18:24
  • You should note that `Fields()` won't return empty strings. So the number of fields returned will vary. If you're trying to parse something consistent, then it won't work for you. You might need to use regex if a `FieldsFunc()` also won't work. – Tom Nov 05 '14 at 19:54
7

I came up with the following, but that seems a bit too verbose:

import "regexp"
r := regexp.MustCompile("[^\\s]+")
r.FindAllString("  word1   word2 word3   word4  ", -1)

which will evaluate to:

[]string{"word1", "word2", "word3", "word4"}

Is there a more compact or more idiomatic expression?

ralfoide
  • 2,705
  • 4
  • 23
  • 21
3

You can use package strings function split strings.Split(someString, " ")

strings.Split

user2368285
  • 105
  • 3
  • The community encourages adding explanations to questions and not posting purely code answers (see [here](https://meta.stackoverflow.com/questions/300837/what-comment-should-i-add-to-code-only-answers)). Also, please have a read of [this](https://stackoverflow.com/editing-help) help page about how to format code properly. – costaparas Dec 26 '20 at 03:37
  • 1
    That's ok, but won't work for tabs, newlines and other whitespaces – user967710 Dec 06 '21 at 20:01
  • 1
    Also does not work for multiple spaces in a row, as stated in the question. – MattArmstrong Dec 05 '22 at 14:46