0

I am trying to find the index of all the word: 'print' in a multi line text. But there are some problems, those are:

  1. The code returns the index same of word 'print' two time if there are two prints in a line.
  2. It is not able to find the index of the second 'print' in the same line, but prints the index of the first 'print' two times. My code is:
text = '''print is print as
it is the function an
print is print and not print
'''

text_list = []

for line in text.splitlines():

    #'line' represents each line in the multiline string
    text_list.append([])

    for letter in line:
        #Append the letter of each line in a list inside the the text_list
        text_list[len(text_list)-1].append(letter)

for line in text_list:
    for letter in line:

        #check if the letter is after 'p' is 'r' and after that 'i' and then 'n' and at last 't'
        if letter == "p":
            num = 1

            if text_list[text_list.index(line)][line.index(letter)+num] == 'r':
                num += 1
                
                if text_list[text_list.index(line)][line.index(letter)+num] == 'i':
                    num += 1

                    if text_list[text_list.index(line)][line.index(letter)+num] == 'n':
                        num += 1

                        if text_list[text_list.index(line)][line.index(letter)+num] == 't':
                            num += 1
                            print(f'index (start,end) = {text_list.index(line)}.{line.index(letter)}, {text_list.index(line)}.{line.index(letter)+num}')
                        

when I run it prints:

index (start,end) = 0.0, 0.5 #returns the index of the first 'print' in first line
index (start,end) = 0.0, 0.5 #returns the index of the first 'print' in first line instead of the index of the second print
index (start,end) = 2.0, 2.5 #returns the index of the first 'print' in third line
index (start,end) = 2.0, 2.5 #returns the index of the first 'print' in third line instead of the index of the second print
index (start,end) = 2.0, 2.5 #returns the index of the first 'print' in third line instead of the index of the third print

you can see that in the result, the index are repeated. This is the text_list:

>>> text_list
[['p', 'r', 'i', 'n', 't', ' ', 'i', 's', ' ', 'p', 'r', 'i', 'n', 't', ' ', 'a', 's'],
['i', 't', ' ', 'i', 's', ' ', 't', 'h', 'e', ' ', 'f', 'u', 'n', 'c', 't', 'i', 'o', 'n', ' ', 'a', 'n'],
['p', 'r', 'i', 'n', 't', ' ', 'i', 's', ' ', 'p', 'r', 'i', 'n', 't', ' ', 'a', 'n', 'd', ' ', 'n', 'o', 't', ' ', 'p', 'r', 'i', 'n', 't']]
>>>

each list inside the text_list is a line in the text. There are three lines, so there are three lists inside the text_list. How do I get the index of the second 'print' in the first line and the index of second and third 'print' in the third line? You can see that it returns only the index of first 'print' in the first and third line.

Parvat . R
  • 751
  • 4
  • 21

4 Answers4

1

strings already have an index method to find substring, and you can give extra arguments to find the next copy of the next copy of a given subtring

>>> text = '''print is print as
it is the function an
print is print and not print
'''
>>> text.index("print")
0
>>> text.index("print",1)
9
>>> text.index("print",10)
40
>>> text.index("print",41)
49
>>> text.index("print",50)
63
>>> text.index("print",64)
Traceback (most recent call last):
  File "<pyshell#6>", line 1, in <module>
    text.index("print",64)
ValueError: substring not found
>>> 
Copperfield
  • 8,131
  • 3
  • 23
  • 29
1

You can use regular expressions:

import re

text = '''print is print as
it is the function an
print is print and not print
'''

for i in re.finditer("print", text):
    print(i.start())

# OR AS A LIST

[i.start() for i in re.finditer("print", text)]
goalie1998
  • 1,427
  • 1
  • 9
  • 16
1
import re

text = '''print is print as
it is the function an
print is print and not print
'''

for line_number, line in enumerate(text.split('\n')):
    occurrences = [m.start() for m in re.finditer('print', line)]

    if occurrences:
        for occurrence in occurrences:
            print('Found `print` at character %d on line %d' % (occurrence, line_number + 1))

->

Found `print` at character 0 on line 1
Found `print` at character 9 on line 1
Found `print` at character 0 on line 3
Found `print` at character 9 on line 3
Found `print` at character 23 on line 3
Gab
  • 3,404
  • 1
  • 11
  • 22
1

You were on the right track initially. You split your text into lines. The next step is to split each line into words, not letters, using the split() method. You can then easily get the index of each 'print' string in each line.

The following code prints the desired indexes as list of lists, with each inner list corresponding to a separate line:

text = '''print is print as
it is the function an
print is print and not print
'''

index_list = []
for line in text.splitlines():
    index_list.append([])
    for idx, word in enumerate(line.split()):
        if word == 'print':
            index_list[-1].append(idx)

print(index_list)

#[[0, 2], [], [0, 2, 5]]
pakpe
  • 5,391
  • 2
  • 8
  • 23