-1

i have a dataset which is a .txt file and each line has items separated by spaces. each line is a different transaction.

the dataset looks like this:

data.txt file

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
20 12 5 41 65
41 6 11 27 81 21
65 15 27 8 31 65 20 19 44 29 41

i created a dictionary with keys as serial num. starting from 0 and each line values seperated by commas as values like this

{0: '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15', 1:'20,12,5,41,65', 2:'41,6,11,27,81,21', 3: '65,15,27,8,31,65,20,19,44,29,41'} 

but i am not able to iterate through each value in dict , is there any way i can convert it into a list of values for each key

i want to find the frequency of each time in the whole dictionary and create a table

item frequency
1 1
2 1
20 2
41 3

like the above

my_dict = {}

with open('text.csv', 'r') as file:
    lines = file.readlines()
    for line in lines:
        my_dict[lines.index(line)] = line.strip()

this is the code i used to create the dictionary but i am not sure what i should change, also i need to find frequency of each value.

Any help would be appreciated. thank u.

  • 1
    So, you don't even care what line a number appears on? The result would be the same if the numbers were all on a single line? Or could the same number appear twice on a single line and should it then only be counted once? I.e. does 'frequency' represent the total count of each number, or does it represent the number of lines that number appears on? – Grismar Nov 14 '22 at 03:04
  • i want the count of that value in the whole dictionary in all lines basically including the line it's in. if it appears twice in a line then count should be 2. – bookworm 1510 Nov 14 '22 at 03:30
  • Wait, you want the count to be 2 if it shows up twice in a line, but you also want the count to reflect how many times a number shows up in all the lines? You can't have both, generally speaking. – ddejohn Nov 14 '22 at 05:01

2 Answers2

0

Since you're really just counting numbers over the entire file, you can just:

my_dict = {}

with open('data.txt', 'r') as file:
    for number in file.read().split():
        my_dict[number] = my_dict.get(number, 0) + 1

print(my_dict)

Result:

{'1': 1, '2': 1, '3': 1, '4': 1, '5': 2, '6': 2, '7': 1, '8': 2, '9': 1, '10': 1, '11': 2, '12': 2, '13': 1, '14': 1, '15': 2, '20': 2, '41': 3, '65': 3, '27': 2, '81': 1, '21': 1, '31': 1, '19': 1, '44': 1, '29': 1}

That just counts the strings representing numbers, you can turn them into actual numbers:

with open('data.txt', 'r') as file:
    for number in file.read().split():
        my_dict[int(number)] = my_dict.get(int(number), 0) + 1

Result:

{1: 1, 2: 1, 3: 1, 4: 1, 5: 2, 6: 2, 7: 1, 8: 2, 9: 1, 10: 1, 11: 2, 12: 2, 13: 1, 14: 1, 15: 2, 20: 2, 41: 3, 65: 3, 27: 2, 81: 1, 21: 1, 31: 1, 19: 1, 44: 1, 29: 1}

Or:

        my_dict[i] = my_dict.get(i := int(number), 0) + 1
Grismar
  • 27,561
  • 4
  • 31
  • 54
  • Could do `for number in map(int, file.read().split())` instead of casting to `int` twice per iteration. – ddejohn Nov 14 '22 at 05:05
  • That's what the last line was intended to do, but I didn't want to complicate the initial answer with that. Do note that using the suggestion with `map()` is probably a little bit more efficient than reusing the value with the walrus operator like I suggested. – Grismar Nov 14 '22 at 05:55
0

An alternate solution would be to use collections.Counter which is intended for counting:

from collections import Counter

with open("data.txt", "r") as file:
    counts = Counter(f.read().split())

If you want to convert the values to integers,

from collections import Counter

with open("data.txt", "r") as file:
    counts = Counter(map(int, f.read().split()))

This works by reading the entire file into a string at once, calling str.split() on the string since your data are all separated by whitespace, and passing the resulting list straight to Counter().

ddejohn
  • 8,775
  • 3
  • 17
  • 30