Counting occurrences of a word in chunks in python (list comprehension)

Question

I am very very new to programming so my apologies if this is going to be too dumb.

I am trying to count all the occurrences of a word by chunks and then I need to plot those results. My text is Pride and Prejudice and I am trying to find how frequent is the name 'Mr.Darcy' by chunks of 3000 words. So I've trying the next unsuccessfully.

x = [chunk.count('Mr. Darcy') for chunk in partition(100000, text1_pride)]

Any one can help? Thanks a lot.

if you think you are using string.partition(), I don't think that is what you want to do here. Or is partition() something else? — dustin-we, Dec 18 '20 at 11:11
if you say "by chunks of 3000" words: what is a word? 'Mr. Darcy' is seperated by a space - so it probably should be treated as 2 words right? — SyntaxError, Dec 18 '20 at 11:15
Please create a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) expressing the problem — William Baker Morrison, Dec 18 '20 at 11:17

score 0 · Answer 1 · answered Dec 18 '20 at 11:26

As stated in the comments before, "Mr. Darcy" would be counted as 2 words, if you separate by spaces. If you want to look for just "Darcy", you could be doing something like this, if your string is called text1_pride

words = text1_pride.split()
chunks = [words[x:x+3000] for x in range(0, len(words), 3000)]
darcy_counts = [chunk.count('Darcy') for chunk in chunks]

This could all be done in one line, with nested list comprehensions.

score 0 · Answer 2 · answered Dec 18 '20 at 11:30

A minimal version of what you want to do based on random data would be:

import random
import loremipsum


text = ' '.join(loremipsum.get_sentences(400)).split() # split into words

# where to replace part with Mr. Darcy
where = [random.randint(1, len(text) - 1) for _ in range(1000)]

for p in where:
    text[p] = "Mr. Darcy"

text = ' '.join(text)

chunk_size = 100

# check for chunk_size list elements (some containing "Mr. Darcy" - most not)

# joins each chunk into a text then looks for Mr. Darcy    
x = [' '.join(chunk).count('Mr. Darcy') for chunk in (
    text[i: i + chunk_size] for i in range(0, len(text), chunk_size))]
    
print(x)

Output:

[34, 28, 28, 34, 35, 22, 25, 31, 26, 32, 23, 21, 37, 32, 29, 40, 30,
28, 40, 29, 35, 31, 25, 34, 28, 31, 32, 11]

You would need to do

with open("yourfile.txt") as f:
    text = f.read().split()

chunk_size = 3000
chunks = [ ' '.join(text[i: i + chunk_size]) for i in range(0, len(text), chunk_size))]

and then count for each chunk in chunks.

Counting occurrences of a word in chunks in python (list comprehension)

2 Answers2