20
word = 'laugh'    
string = 'This is laughing laugh'
index = string.find ( word )

index is 8, should be 17. I looked around hard, but could not find an answer.

Psidom
  • 209,562
  • 33
  • 339
  • 356
Khan
  • 223
  • 2
  • 3
  • 8
  • New to Python, re is too complicated for me to solve this yet! – Khan Aug 15 '16 at 13:49
  • 1
    I found 194 questions on this site when I search for "how to find a word in a string". Are you saying _none_ of those answers helped? – Bryan Oakley Aug 15 '16 at 13:50
  • 1
    8 is the right answer, [`find`](https://docs.python.org/2/library/string.html#string.find) returns the starting position of the first matching substring – miraculixx Aug 15 '16 at 13:51
  • 3
    Does this answer your question? [Finding the position of a word in a string](https://stackoverflow.com/questions/33053641/finding-the-position-of-a-word-in-a-string) – Abu Shoeb Jul 17 '20 at 16:42

5 Answers5

41

You should use regex (with word boundary) as str.find returns the first occurrence. Then use the start attribute of the match object to get the starting index.

import re

string = 'This is laughing laugh'

a = re.search(r'\b(laugh)\b', string)
print(a.start())
>> 17

You can find more info on how it works here.

DeepSpace
  • 78,697
  • 11
  • 109
  • 154
  • Great! Could you let me know how to use a variable in the re expression, i.e I want to use word instead of (laugh)? – Khan Aug 15 '16 at 13:58
  • 4
    @Khan Like you would with any Python string. You can concat or use `.format`, ie `word = 'laugh' ; re.search(r'\b({})\b'.format(word), string)` – DeepSpace Aug 15 '16 at 14:13
  • 1
    This worked: re.compile(r'\b%s\b' % word, re.I) not sure why re.search(r'\b({})\b‌​'.format(word), string) didn't... – Khan Aug 15 '16 at 14:50
  • 1
    Many Thanks! Spent a lot of time on this to find out (newbie!). – Khan Aug 15 '16 at 14:52
7

try this:

word = 'laugh'    
string = 'This is laughing laugh'.split(" ")
index = string.index(word)

This makes a list containing all the words and then searches for the relevant word. Then I guess you could add all of the lengths of the elements in the list less than index and find your index that way

position = 0
for i,word in enumerate(string):
    position += (1 + len(word))
    if i>=index:
        break

print position  

Hope this helps.

Daniel Lee
  • 7,189
  • 2
  • 26
  • 44
4

Here is one approach without regular expressions:

word = 'laugh'    
string = 'This is laughing laugh'
# we want to find this >>> -----
# index   0123456789012345678901     
words = string.split(' ')
word_index = words.index(word)
index = sum(len(x) + 1 for i, x in enumerate(words) 
            if i < word_index) 
=> 17

This splits the string into words, finds the index of the matching word and then sums up the lengths and the blank char as a separater of all words before it.

Update Another approach is the following one-liner:

index = string.center(len(string) + 2, ' ').find(word.center(len(word) + 2, ' '))

Here both the string and the word are right and left padded with blanks as to capture the full word in any position of the string.

You should of course use regular expressions for performance and convenience. The equivalent using the re module is as follows:

r = re.compile(r'\b%s\b' % word, re.I)
m = r.search(string)
index = m.start()

Here \b means word boundary, see the re documentation. Regex can be quite daunting. A great way to test and find regular expressions is using regex101.com

miraculixx
  • 10,034
  • 2
  • 41
  • 60
  • downvote all you like but please add a comment so I can improve the answer. – miraculixx Aug 15 '16 at 14:15
  • 1
    r = re.compile(r'\b%s\b' % word, re.I) worked like a charm. Your complete solution also works! Thanks a lot! – Khan Aug 15 '16 at 14:53
  • The reason for the downvote is that this answer (both parts of it) already exist in very similar forms. – XtrmJosh Aug 15 '16 at 15:00
  • @XtrmJosh I came up with these solutions and the whole answer by myself. Also if you look carefully this exact solution was not posted by anybody else. – miraculixx Aug 15 '16 at 15:31
  • index = sum(len(x) + 1 for i, x in enumerate(words) if i < word_index) is not giving right char index. – Rashmi Jain Oct 04 '18 at 12:48
  • @RashmiJain what index would you expect? It returns 17 which is the starting index for the word 'laugh' and is correct as per the stated expectation in the original question. – miraculixx Oct 06 '18 at 15:32
  • index = sum(len(x) + 1 for i, x in enumerate(words) if i < word_index) piece of code is not working in general to give the character index from word's index. Yes But respective to the above quesgion it is working. – Rashmi Jain Oct 08 '18 at 08:22
  • @RashmiJain can you give an example where it does not work? It works under the assumption that the word boundaries are spaces, more specifically the same as the `sep` argument to the `split(sep)` method – miraculixx Oct 08 '18 at 12:54
1

Strings in code are not separated by spaces. If you want to find the space, you must include the space in the word you are searching for. You may find it would actually be more efficient for you to split the string into words then iterate, e.g:

str = "This is a laughing laugh"
strList = str.split(" ")
for sWord in strList:
    if sWord == "laugh":
        DoStuff()

As you iterate you can add the length of the current word to an index and when you find the word, break from the loop. Don't forget to account for the spaces!

XtrmJosh
  • 889
  • 2
  • 14
  • 33
  • I can find that the word is in string, I want to know its index. – Khan Aug 15 '16 at 13:51
  • My bad, you can add the length of each word as you iterate. It's probably less efficient than the regex method listed, but I try to avoid regex in Python where possible - I see it as a scripting language and as something to be kept easy to read over performant. – XtrmJosh Aug 15 '16 at 13:53
0

I stumbled upon this. I hope by now you would have figured it out. If you haven't maybe this would help. I had the same dilemma as you, was trying to print out a word using index.

string = 'This is laughing laugh'
word = string.split(" ")
print(word[02])

This would print out laughing.

I hope this helps. This is the first time of me answering a question on this forum, please pardon my syntax.

Thank you.

  • `print(word[02])` This will fail in Python 3: "SyntaxError: leading zeros in decimal integer literals are not permitted" – ShpielMeister Sep 28 '21 at 03:05