4

The title of How do I do what strtok() does in C, in Python? suggests it should answer my question but the specific strtok() behavior I'm looking for is breaking on any one of the characters in the delimiter string. That is, given:

const char* delim = ", ";
str1 = "123,456";
str2 = "234 567";
str3 = "345, 678";

strtok() finds the substrings of digits regardless of how many characters from delim are present. Python's split expects the entire delimiting string to be there so I can't do:

delim = ', '
"123,456".split(delim)

because it doesn't find delim as a substring and returns a list of single element.

Community
  • 1
  • 1
Chris Nelson
  • 3,519
  • 7
  • 40
  • 51

2 Answers2

5

If you know that the tokens are going to be numbers, you should be able to use the split function from Python's re module:

import re
re.split("\D+", "123,456")

More generally, you could match on any of the delimiter characters:

re.split("[ ,]", "123,456")

or:

re.split("[" + delim + "]", "123,456")
Sam Mussmann
  • 5,883
  • 2
  • 29
  • 43
1

Using replace() to normalize your delimiters all to the same character, and split()-ting on that character, is one way to deal with simpler cases. For your examples, replace(',',' ').split() should work (converting the commas to spaces, then using the special no-argument form of split to split on runs of whitespace).

In Python, when things start getting too complex for split and replace you generally turn to the re module; see Sam Mussmann's more general answer.

Russell Borogove
  • 18,516
  • 4
  • 43
  • 50