16

What's the shortest way to do this in Python?

string = "   xyz"

must return index = 3

kmonsoor
  • 7,600
  • 7
  • 41
  • 55
Pablo
  • 4,821
  • 12
  • 52
  • 82
  • Almost duplicate of: https://stackoverflow.com/questions/2268532/grab-a-lines-whitespace-indention-with-python – 0 _ Sep 11 '17 at 16:06

7 Answers7

49
>>> s = "   xyz"
>>> len(s) - len(s.lstrip())
3
Frank
  • 10,461
  • 2
  • 31
  • 46
  • 3
    If s is long and the whitespace prefix is short, other solutions (ones that don't make a temp almost-copy of s, get its length, and then throw the temp object away) may be preferable. – John Machin Mar 04 '10 at 23:39
  • @JohnMachin string are immutable in Python so I very much doubt the interpreter makes a copy for `strip()`. The original string can simply be reused with a different start position. – bernie Sep 03 '20 at 19:58
6
>>> next(i for i, j in enumerate('   xyz') if j.strip())
3

or

>>> next(i for i, j in enumerate('   xyz') if j not in string.whitespace)
3

in versions of Python < 2.5 you'll have to do:

(...).next()
SilentGhost
  • 307,395
  • 66
  • 306
  • 293
2

Many of the previous solutions are iterating at several points in their proposed solutions. And some make copies of the data (the string). re.match(), strip(), enumerate(), isspace()are duplicating behind the scene work. The

next(idx for idx, chr in enumerate(string) if not chr.isspace())
next(idx for idx, chr in enumerate(string) if not chr.whitespace)

are good choices for testing strings against various leading whitespace types such as vertical tabs and such, but that adds costs too.

However if your string uses just a space characters or tab charachers then the following, more basic solution, clear and fast solution also uses the less memory.

def get_indent(astr):

    """Return index of first non-space character of a sequence else False."""

    try:
        iter(astr)
    except:
        raise

    # OR for not raising exceptions at all
    # if hasattr(astr,'__getitem__): return False

    idx = 0
    while idx < len(astr) and astr[idx] == ' ':
        idx += 1
    if astr[0] <> ' ':
        return False
    return idx

Although this may not be the absolute fastest or simpliest visually, some benefits with this solution are that you can easily transfer this to other languages and versions of Python. And is likely the easiest to debug, as there is little magic behavior. If you put the meat of the function in-line with your code instead of in a function you'd remove the function call part and would make this solution similar in byte code to the other solutions.

Additionally this solution allows for more variations. Such as adding a test for tabs

or astr[idx] == '\t':

Or you can test the entire data as iterable once instead of checking if each line is iterable. Remember things like ""[0] raises an exception whereas ""[0:] does not.

If you wanted to push the solution to inline you could go the non-Pythonic route:

i = 0
while i < len(s) and s[i] == ' ': i += 1

print i
3

. .

DevPlayer
  • 5,393
  • 1
  • 25
  • 20
2

Looks like the "regexes can do anything" brigade have taken the day off, so I'll fill in:

>>> tests = [u'foo', u' foo', u'\xA0foo']
>>> import re
>>> for test in tests:
...     print len(re.match(r"\s*", test, re.UNICODE).group(0))
...
0
1
1
>>>

FWIW: time taken is O(the_answer), not O(len(input_string))

John Machin
  • 81,303
  • 11
  • 141
  • 189
1
import re
def prefix_length(s):
   m = re.match('(\s+)', s)
   if m:
      return len(m.group(0))
   return 0
D.Shawley
  • 58,213
  • 10
  • 98
  • 113
-1
>>> string = "   xyz"
>>> next(idx for idx, chr in enumerate(string) if not chr.isspace())
3
Adrien Plisson
  • 22,486
  • 6
  • 42
  • 73
-1
>>> string = "   xyz"
>>> map(str.isspace,string).index(False)
3
ghostdog74
  • 327,991
  • 56
  • 259
  • 343