Let's break down your code line by line:
needle = 'sses'
haystack = 'assesses'
count = 0 # initialize the counter
Fine so far, this is only initialization.
index = haystack.index(needle) # get the first character in the substring
This line already is a problem, index
raises a ValueError
if it doesn't find the substring. Your program would crash in this case. You should instead use haystack.find(needle)
which does the same, but instead of raising a ValueError
it return -1
if the substring isn't found.
I do however not understand why you use this line at all. Your following loop will loop through the whole haystack
and will also find the first appearance of needle
.
string1 = haystack[index:len(needle) + index] # get the whole substring
This line is only valid if needle
was found in the previous line. Also guess what string1
will be after this line? You are extracting the part of haystack
where you previously found the substring needle
. So the result will be string1 == needle
, which won't help you in any way.
for position in range(0,len(haystack)): # loop through the string
ok, you loop through all positions in the string.
if haystack[position:len(needle) + index] == string1: # match the 1st substring
So here I don't get why you want to find the first occurence again, which you already found before. Don't you want to check whether there is a match at position
, no matter whether it is the first or second or third... one? So I would guess that haystack[position:len(needle) + index]
is supposed to extract the substring of haystack
that starts at position position
and has length of needle
. But why is there + index
then? What has the first occurence (saved in index
) to do with this? Don't you mean + position
here? Finally you are comparing to string1
which as I said will be (if your code makes it to this line) equal to needle
. So why not directly compare to needle
?
count += 1 # iterate the counter
This line is wrong indented in your posted code, it should be one deeper than the if-statement.
Finally you have to consider in your for-loop, that if position
gets to the end of haystack
there might not be a substring with length len(needle)
starting at position
anymore. So you probably want to stop iterating prior to that. (EDIT: I just notice that the code will run correctly anyway. It is not necessary to address this in python, because using indexes out of bounds of the string is allowed, but it would be in other languages.)
I guess this is an exercise, but if it wasn't there would be a much easier way to do this in python: count = haystack.count(needle)
. There is a small difference to your proposed algorithm though. string.count(substring)
will return the number of non-overlapping matches, while your current code would find the number of non-overlapping and overlapping matches. The exercise as posted by you is unclear which of both is meant. But if you are supposed to find only non-overlapping results you need to consider this in your for-loop, too.
There are also several improvements one could make to the style and performance of your code, but I will not get into this, as you seem to have trouble getting it to work at all.