Count occurrences of a substring in a list of strings

Question

I know that counting the simple occurrences of a list item is as easy as:

>>> [1, 2, 3, 4, 1, 4, 1].count(1)
3

But what I would like to know how to do is count every time a string appears in a substring of list entries.

For example, I want to see how many times foo appears in the list data:

data = ["the foo is all fooed", "the bar is all barred", "foo is now a bar"]

Doing:

d_count = data.count('foo')
print("d_count:", d_count)

produces:

d_count: 0

but I expect to get:

d_count: 2

I also tried doing:

d_count = data.count(any('foo' in s for s in data))
print("d_count:", d_count)

but that also gives zero as a result.

I would like to know how to count each occurrence of substring appearances in a list.

What result do you expect – 2 or 3 (since "foo" occurs twice in first string)? — Błotosmętek, Aug 17 '17 at 15:11
Related: https://stackoverflow.com/questions/45719958/how-to-count-numbers-in-a-list-via-certain-rules/45720028#45720028 — bendl, Aug 17 '17 at 15:13

Christian Dean · Accepted Answer · 2017-08-17T15:44:54.273

40

You can do this by using the sum built-in function. No need to use list.count as well:

>>> data = ["the foo is all fooed", "the bar is all barred", "foo is now a bar"]
>>> sum('foo' in s for s in data)
2
>>>

This code works because booleans can be treated as integers. Each time 'foo' appears in a string element, True is returned. the integer value of True is 1. So it's as if each time 'foo' is in a string, we return 1. Thus, summing the 1's returned will yield the number of times 1 appeared in an element.

A perhaps more explicit but equivalent way to write the above code would be:

>>> sum(1 for s in data if 'foo' in s)
2
>>>

edited Aug 17 '17 at 15:44

answered Aug 17 '17 at 15:10

Christian Dean

22,138
7
54
87

That worked perfectly, the `sum` command that is, so thanks a ton. I figured there would be some simple command that would solve it that I was simply unaware of. – theprowler Aug 17 '17 at 15:19
Glad I could help out @theprowler ;-) – Christian Dean Aug 17 '17 at 15:19
The last method looks dubious. What if one of the strings ends with " fo" and the next one starts with "o "? It will also give different results to the other methods because it counts *all* occurences (and it only finds whole words rather than just substrings). – ekhumoro Aug 17 '17 at 15:26
@ekhumoro I'm not following on the first part of your comment. If one string ends with "fo" and the next starts with "o" then the method would not count them as "foo", which is correct, no? Nevertheless, those are some valid points you bring up. I'll add a caveat about the behavior of the last method. – Christian Dean Aug 17 '17 at 15:33
@ChristianDean. It would count it as "foo" if one string ends with `" fo"` and the next with `"o "` (note the spaces). But in any case, the point is that the last method is not at all consistent with the other two - it is really doing a completely different thing, so I don't know why you've included it. – ekhumoro Aug 17 '17 at 15:39
@ekhumoro Ah, I get what you mean. After looking back I see you're right. The last method was really just the result of me fooling around without testing all of the cases first. But like you said, it's behavior is inconsistent with the other two methods. I'll remove it from my answer. Sorry 'bout that. – Christian Dean Aug 17 '17 at 15:44
1

Wow - I just tried `True + True` and it returned `2`. This sounds like some nonsense I'd expect from Javascript. The amount I love Python has been slightly decreased, I think... – ArtOfWarfare Mar 21 '21 at 21:14
@ArtOfWarfare Yeah, that's understandable. At the time I wrote this answer I thought this was pretty clever and convenient, but now this is really bad code. But note you can do the same thing in a lot of languages. C, for example, let's you add booleans, which conceptually, doesn't make any sense. On the other hand, one of the things I like about languages like golang, for example, is that it doesn't let you do nonsense like adding booleans. And when it does, it forces you to be explict. – Christian Dean Mar 22 '21 at 01:48
`sum(int('foo' in s) for s in data)` is a more explicit alternative. – Lutz Prechelt Sep 30 '22 at 16:38

score 1 · Answer 2 · answered Aug 17 '17 at 15:10

1

You can try this:

from itertools import chain

data = ["the foo is all fooed", "the bar is all barred", "foo is now a bar"]

data = list(chain.from_iterable([i.split() for i in data]))

print(data.count("foo"))

Output:

answered Aug 17 '17 at 15:10

Ajax1234

69,937
8
61
102

score 1 · Answer 3 · answered Aug 18 '17 at 05:58

If data = ["abababa in foo", "abababa"]

find occurance of "aba" from the list, you should use below code:

>>> data = ["abababa in foo", "abababa"]
>>> str = "aba"
>>> length = len(str)
>>> sum(element[index:index+length] == str for element in data for index,char in enumerate(element))
6

score 1 · Answer 4 · edited Sep 30 '22 at 16:33

1

Counting all the 'foo' occurrences (not only one per string) can be done with

sum(s.count('foo') for s in data)

edited Sep 30 '22 at 16:33

Lutz Prechelt

36,608
11
63
88

answered Sep 26 '22 at 06:49

Adinath Gore

21
4

score 0 · Answer 5 · answered Mar 21 '20 at 23:29

0

@ChristianDean answer was great, however it is hard to read (at least for me). So here is a more readable/easier to understand version.

count = 0
for s in data:
    if 'foo' in s:
        count += 1

answered Mar 21 '20 at 23:29

Urban P.

119
1
9

Count occurrences of a substring in a list of strings

5 Answers5

Linked

Related