Returning the lowest index for the first non whitespace character in a string in Python

Question

What's the shortest way to do this in Python?

string = "   xyz"

must return index = 3

Almost duplicate of: https://stackoverflow.com/questions/2268532/grab-a-lines-whitespace-indention-with-python — 0 _, Sep 11 '17 at 16:06

score 49 · Accepted Answer · answered Mar 04 '10 at 11:51

49

>>> s = "   xyz"
>>> len(s) - len(s.lstrip())
3

answered Mar 04 '10 at 11:51

Frank

10,461
2
31
46

3

If s is long and the whitespace prefix is short, other solutions (ones that don't make a temp almost-copy of s, get its length, and then throw the temp object away) may be preferable. – John Machin Mar 04 '10 at 23:39
@JohnMachin string are immutable in Python so I very much doubt the interpreter makes a copy for `strip()`. The original string can simply be reused with a different start position. – bernie Sep 03 '20 at 19:58

score 6 · Answer 2 · answered Mar 04 '10 at 11:49

6

>>> next(i for i, j in enumerate('   xyz') if j.strip())
3

or

>>> next(i for i, j in enumerate('   xyz') if j not in string.whitespace)
3

in versions of Python < 2.5 you'll have to do:

(...).next()

answered Mar 04 '10 at 11:49

SilentGhost

307,395
66
306
293

`blah.strip()` and `blah.isspace()` work OK with Unicode; string.whitespace is frozen in the last century. – John Machin Mar 04 '10 at 12:26
@John: says who? I see `string.whitespace` as the second most efficient approach after the accepted one. – SilentGhost Mar 04 '10 at 12:31
1

Re-read my comment. I'm talking about working with Unicode; no mention of efficiency. – John Machin Mar 04 '10 at 12:43

DevPlayer · Answer 3 · 2012-11-29T02:16:38.340

Many of the previous solutions are iterating at several points in their proposed solutions. And some make copies of the data (the string). re.match(), strip(), enumerate(), isspace()are duplicating behind the scene work. The

next(idx for idx, chr in enumerate(string) if not chr.isspace())
next(idx for idx, chr in enumerate(string) if not chr.whitespace)

are good choices for testing strings against various leading whitespace types such as vertical tabs and such, but that adds costs too.

However if your string uses just a space characters or tab charachers then the following, more basic solution, clear and fast solution also uses the less memory.

def get_indent(astr):

    """Return index of first non-space character of a sequence else False."""

    try:
        iter(astr)
    except:
        raise

    # OR for not raising exceptions at all
    # if hasattr(astr,'__getitem__): return False

    idx = 0
    while idx < len(astr) and astr[idx] == ' ':
        idx += 1
    if astr[0] <> ' ':
        return False
    return idx

Although this may not be the absolute fastest or simpliest visually, some benefits with this solution are that you can easily transfer this to other languages and versions of Python. And is likely the easiest to debug, as there is little magic behavior. If you put the meat of the function in-line with your code instead of in a function you'd remove the function call part and would make this solution similar in byte code to the other solutions.

Additionally this solution allows for more variations. Such as adding a test for tabs

or astr[idx] == '\t':

Or you can test the entire data as iterable once instead of checking if each line is iterable. Remember things like ""[0] raises an exception whereas ""[0:] does not.

If you wanted to push the solution to inline you could go the non-Pythonic route:

i = 0
while i < len(s) and s[i] == ' ': i += 1

print i
3

. .

score 2 · Answer 4 · answered Mar 04 '10 at 12:38

Looks like the "regexes can do anything" brigade have taken the day off, so I'll fill in:

>>> tests = [u'foo', u' foo', u'\xA0foo']
>>> import re
>>> for test in tests:
...     print len(re.match(r"\s*", test, re.UNICODE).group(0))
...
0
1
1
>>>

FWIW: time taken is O(the_answer), not O(len(input_string))

score 1 · Answer 5 · answered Mar 04 '10 at 12:40

1

import re
def prefix_length(s):
   m = re.match('(\s+)', s)
   if m:
      return len(m.group(0))
   return 0

answered Mar 04 '10 at 12:40

D.Shawley

58,213
10
98
113

"""Make sure your code "does nothing" gracefully.""" -- attributed to Jon Bentley IIRC. – John Machin Mar 04 '10 at 13:06
Forgive me my ignorance, but who is him? – Pablo Mar 04 '10 at 13:28
3

Ignorance is forgivable; unwillingness to use a search engine is another matter ;-) http://en.wikipedia.org/wiki/Jon_Bentley – John Machin Mar 04 '10 at 14:23
@JohnMachin - D'oh... good point about `+` instead of `*`. My thinking cap wasn't fully on this morning. – D.Shawley Mar 04 '10 at 14:31
Also you have redundant parentheses. – John Machin Mar 04 '10 at 14:44

score -1 · Answer 6 · answered Mar 04 '10 at 11:54

-1

>>> string = "   xyz"
>>> next(idx for idx, chr in enumerate(string) if not chr.isspace())
3

answered Mar 04 '10 at 11:54

Adrien Plisson

22,486
6
42
73

-1 as it fails for any all-whitespace string ... **"StopIteration:"** error is output in that case – kmonsoor Feb 14 '14 at 09:25

score -1 · Answer 7 · answered Mar 04 '10 at 12:28

-1

>>> string = "   xyz"
>>> map(str.isspace,string).index(False)
3

answered Mar 04 '10 at 12:28

ghostdog74

327,991
56
259
343

-1 as it fails for any all-whitespace string ... :( **"ValueError: False is not in list"** – kmonsoor Feb 14 '14 at 09:23

Returning the lowest index for the first non whitespace character in a string in Python

7 Answers7

Linked