A beginner task in python with txt data processing, sum of numbers in a column

Question

I have a big txt file, with datas like this format:

Name Points
Joe 1
Joe 5
Anna 6
Anna 1
Eva 9
Eva 6

(There is more line and name, but the same number of lines per name.)

And I need the sum of the numbers by names in a list, and the main goal is to find the first ten name with the greatest sum.

Something like that:

Best:
Eva 15
Anna 7
Joe 6

How would you solve it?

(I tryed and now I got stuck: I can open the file, spilt it into lines, and split it to words like this:

file = open('sum.txt')

with open('sum.txt') as f:
    line = f.readlines()

line[1].split()

But all other kind of solutions are welcome.)

score 0 · Answer 1 · answered Apr 11 '21 at 18:22

This sounds a lot like an assignment, so I'm not going to give you a complete solution but I'll steer you in the right direction.

What you have so far looks good, you have opened the file and read the lines. You can then iterate over them in a for loop

for line in f.readlines():
    parts = line.split()

You can convert strings to numbers using the int() or float() functions. Then it's just a matter of storing the sums somwhere. I'd suggest either a dict() or a collections.defaultdict with a factory that returns 0.

score 0 · Answer 2 · answered Apr 11 '21 at 18:26

This sounds like homework.

# Naive implementation in Python 3.x.
from collections import defaultdict

with open("sum.txt") as f:
    content = f.read()
lines = content.strip("\n").split("\n")

item_count = defaultdict(int)
for line in lines:
    name, count = line.split(" ")
    item_count[name] += count

sorted_tuples = sorted(item_count.items(), key=lambda item: item[1], reverse=True)
top_ten = dict(sorted_tuples[:10])

score 0 · Answer 3 · answered Apr 11 '21 at 18:38

import pandas as pd

data = pd.read_csv(your_path.txt, sep="\t")

data["Tot Points"] = data.groupby("Name")["Points"].transform(sum) 
data.sort_values(by="Tot Points", ascending=False, inplace=True) 
data.drop_duplicates(subset=["Name"], inplace=True) 
data.drop(columns="Points", inplace=True) 
data = data.head(n=10)

data.to_csv("output_path.txt", sep="\t", index=False)

A beginner task in python with txt data processing, sum of numbers in a column

3 Answers3