1

I'd like to get a String array from a String that is delimited with spaces (" ") and commas. Is there a clever way to do this?

For example, if the string was:

cat dog giraffe "big elephant" snake

I'd like the resulting array to contain strings

cat

dog

giraffe

big elephant

snake

I know I could do a Split(str, " ") but the result would differ from what I wanted. I've never used RegEx, but I have a hunch that the solution might have something to do with it.

Community
  • 1
  • 1

2 Answers2

3

Treating the input as space-delimited CSV can greatly simplify the task:

Imports Microsoft.VisualBasic.FileIO.TextFieldParser
...
Dim s As String = "cat dog giraffe ""big elephant"" snake"
Dim afile As FileIO.TextFieldParser = New FileIO.TextFieldParser(New System.IO.StringReader(s))
Dim CurrentRecord As String()
afile.TextFieldType = FileIO.FieldType.Delimited
afile.Delimiters = New String() {" "}
afile.HasFieldsEnclosedInQuotes = True
Do While Not afile.EndOfData
    Try
        CurrentRecord = afile.ReadFields
        Console.WriteLine(String.Join("; ", CurrentRecord))
    Catch ex As FileIO.MalformedLineException
        Stop
    End Try
Loop

It prints cat; dog; giraffe; big elephant; snake.

The code is adapted from Parse Delimited CSV in .NET.

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

You can use a regex for this :

Const data = "åäöÄ åäöÄ ""åäöÄ åäöÄ"" åäöÄ"

Dim matches = Regex.Matches (data, "\p{L}+|""\p{L}+(?: \p{L}+)*""")

For Each m As Match in matches
    Console.WriteLine (m.Value.Trim(""""))
Next

The regex works as follow :

  • match either \p{L}+ which means one or more letter as much as possible
  • or (denoted by the |) match "\p{L}+(?: \p{L}+)*" in detail :
    • " match a quote
    • \p{L}+ match one or more letter as much as possible
    • the (?: \p{L}+)* means a group which doesn't result in a capture repeated zero or more times as much as possible
      This group consist in a space followed by one or more letter as much as possible
    • finally match the closing quote "

Then we just have to Trim the resulting match to eliminate the potential startind/ending quote

Note : see here for more info about \p{L}

Sehnsucht
  • 5,019
  • 17
  • 27
  • What about non-english text? Does it melt down when åäö is inputted? – Daniel Ahrari Dec 02 '16 at 12:37
  • It was melting down, it wasn't stated as a requirement though but I've editted the code to support them – Sehnsucht Dec 02 '16 at 12:46
  • Depends where in alphabet öä and z are located. In our language, z is next to s (...pqrszšžt...) and even t, u etc are ignored :) – Arvo Dec 02 '16 at 12:47
  • @Sehnsucht Interesting, thanks for pointing out. Do unicode letters include numerics and punctuation marks? (eg for "big.0-eléphant" case) – Arvo Dec 02 '16 at 12:51