1

I want to count the number of times each word is found in the text file and am not sure what is wrong. I was also having trouble finding a way to include in the count the occurrences where the word is not capitalized as well

  • the script expects two command-line arguments: the name of an input file and a threshold (an integer)
  • The input file contains exactly one word per line, with no whitespace before or after the word. The script does not need to verify the contents of the input file.

    The letter case of words in the input file does not matter for counting. For example, the script should count “the”, “The”, and “THE” as the same word.

    After counting the words, the script prints a report (to a file, output.txt) that lists the words and their counts. Each word is printed only if its count is greater than or equal to the threshold given on the command line.

Here is my code:

file = open(r"E:\number.txt", "r", encoding="utf-8-sig")

from collections import Counter
word_counter = Counter(file.read().split())

for item in word_counter.items():
    print("{}\t{}".format(*item))

file.close()

but I want the output in the following manner:

enter another image description here

natn2323
  • 1,983
  • 1
  • 13
  • 30
  • after read() put lower() – LtWorf Oct 12 '18 at 22:49
  • 1
    Why do you want an image for output? – Jongware Oct 12 '18 at 23:08
  • I just don't know how to write the output like the image show.. not want an image for output – Tired tiger Oct 12 '18 at 23:20
  • 1
    You can map lower function to the list of words as shown here: https://stackoverflow.com/questions/35184306/how-to-ignore-case-while-doing-most-common-in-pythons-collections-counter Something like this `word_counter = Counter(map(str.lower, file.read().split()))` – Santiago Bruno Oct 12 '18 at 23:22
  • 1
    "but I want the output in the following manner: (*image*)". So why not simply include the output as *text* in your question? I see no need for an image here. – Jongware Oct 13 '18 at 09:07

2 Answers2

0
import re
import string
frequency = {}
file1 = open('s1.txt', 'r') # assuming the words are stored in s1.txt
text1 = file1.read().lower()
match_pattern = re.findall(r'[a-z]{1,189819}', text1)
# The longest word in English has 189,819 letters and would take you three and a half hours  
#to pronounce correctly. Seriously. It's the chemical name of Titin (or connectin), a giant protein  
#"that functions as a molecular spring which is responsible for the passive   elasticity of muscle.  


for word in match_pattern:
   count = frequency.get(word,0)
   frequency[word] = count + 1

frequency_list = frequency.keys()
for words in frequency_list:
   print words, frequency[words]

read the file with all words converted to lower or uppercase.
create a dict with words in the file as keys and frequency of the words as its values. longest length of word in english link

bipin_s
  • 455
  • 3
  • 15
0

Or with pandas

import pandas as pd                                #Import Pandas
text1= pd.read_csv("E:\number.txt", header=None)   #Read text file    
s=pd.Series(text1[0]).str.lower()                  #convert to lowercase series
frequency_list = s.value_counts()                  #get frequencies of unique values
bart cubrich
  • 1,184
  • 1
  • 14
  • 41