Calculating sum and export to a file , based on the random number of inputs of a column

Question

This is a repost of my previous request (Summing up the total based on the random number of inputs of a column). But in this question, I had requested to provide the solution without using pandas library.

The problem is same as before. I need to sum up the "value" column amount for each value of col1 of the File1 and export it to an output file. I'm new in python and need to do it for thousands of records.

File1

col1 col2              value
559 1   91987224    2400000000
559 0   91987224    100000000
558 0   91987224    100000000
557 2   87978332    500000000
557 1   59966218    2400000000
557 0   64064811    100000000

Desired Output:

col1      Sum 
559     2500000000
558     1000000000
557     3000000000

Thanks in advance.

P.S : I can't use the pandas library due to permission issues.I tried the following code. Sharing my effort till now:

import csv 
fin = open("File1.txt","r")
list_txid = {}
amount_tx = {}

for line in fin:
    line = line.rstrip()
    f = line.split("\t")
    txid = f[0]
    amount = int(f[3])

fin.close()
for txid in list_txid:
    amount_tx[txid] += amount
    print("{0}\t{1:d}\n".format(txid, amount_tx[txid]))

Possible duplicate of [Summing up the total based on the random number of inputs of a column](https://stackoverflow.com/questions/49132301/summing-up-the-total-based-on-the-random-number-of-inputs-of-a-column) (please check: https://meta.stackexchange.com/a/7054) — Matteo Ragni, Mar 12 '18 at 07:19

score 1 · Accepted Answer · answered Mar 12 '18 at 07:23

can you use numpy? If not, well the issue seems to be that during the iteration of the file you aren't updating the values

Now, for reading of the file:

with open('File1.txt') as fin:
    reader = csv.reader(fin, delimiter='\t')

is how I would suggest opening it. As a note, you don't need to specify 'r' as the mode (second variable for open) as it is assumed to be so by default. What the 'with open' command does, as opposed to 'fin = open', is that it automatically closes the file after the indentation. You save two lines of code, and more importantly, if you forget to type fin.close() - after all it wouldn't throw an error in the code - the file is closed anyways

reader = csv.reader(fin, delimiter='\t') basically automatically does the stripping the white space from the end, as well as splitting it by tab space

Here is how I would change your code overall

import csv
amount_tx = {}

with open('File1.txt') as fin:
    reader = csv.reader(fin, delimiter='\t')
    for f in reader:
        txid, amount = f[0], int(f[3])
        try:
            amount_tx[txid] += amount
        except KeyError:
            amount_tx[txid] = amount

with open('OutputFileName.txt','w') as w:
    for txid, amount in amount_tx.items():
        w.write('%s\t%d\n' % (txid, amount))

if you are using python 2.X instead of 3.X, amount_tx.items should be amount_tx.iteritems()

'OutputFileName.txt' should be replaced by the name of the file you want to save the results in open(FNAME,'w') specifies that you are writing to the file instead of reading it (this starts by deleting / recreating the file, if you want to keep the file and append to it, use 'a' instead)

score 1 · Answer 2 · answered Mar 12 '18 at 07:29

import csv 
fin = open("File1.txt","r")
list_txid = {}
for line in fin:
    line = line.rstrip()
    f = line.split()
    if('value' not in f):
      try:
        list_txid[f[0]]+=int(f[3])
      except:
        list_txid[f[0]]=int(f[3])
fin.close()
print("{0}\t{1}\n".format('col1', 'Sum'))
for k,v in list_txid.items():
    print("{0}\t{1:d}".format(k, v))

Output:

col1    Sum

559 2500000000
558 100000000
557 3000000000

score 1 · Answer 3 · answered Mar 12 '18 at 07:38

Like the other answers but using defaultdict to default to an integer that you can sum to if you don't have the key in your dictionary.

from collections import defaultdict
import csv

with open('file1.txt') as fin:
    reader = csv.reader(fin, delimiter='\t')

    amount_tx = defaultdict(int)
    # Skip headers
    next(reader)
    for line in reader:
        key = line[0]
        amount_tx[key] += int(line[3])

with open('OutputFile.txt','w') as w:
    # Write new headers
    w.write("Col1   Sum\n")
    for tx_id, tx_amount in amount_tx.items():
        w.write("{0}\t{1:d}\n".format(tx_id,tx_amount))

score 0 · Answer 4 · answered Mar 12 '18 at 07:18

Probably not the nicest way to do it but given that you cannot use pandas: this works.

import csv
fin = open("File1.txt","r")
# skip the first line
next(fin)

amount_tx = {}

for line in fin:
    # make the line into a list of the form ['x', 'y', 'z', 'a']
    line = line.rstrip()
    f = line.split("\t")
    g = f[0].split()

    # get the two variables necessary
    txid = g[0]
    amount = int(g[3])

    # add to dictionary if not yet present
    if txid not in amount_tx:
        amount_tx[txid] = 0
    amount_tx[txid] += amount

fin.close()

for txid in amount_tx:

    print("{0}\t{1:d}\n".format(txid, amount_tx[txid]))

score -1 · Answer 5 · edited Mar 12 '18 at 07:57

-1

You can use pandas library in python.

It has a functionality for grouping the rows and to sum the columns you want.

import pandas as pd
df = pd.read_excel("File1.txt")

print df.groupby(['col1'])[["value"]].sum()

edited Mar 12 '18 at 07:57

Nathan

3,558
1
18
38

answered Mar 12 '18 at 07:15

Tom Antony

79
2
9

Unfortunately in the question it says they can't use pandas. – Nathan Mar 12 '18 at 07:15
Thanks for your suggestion. but I can't use pandas library, mentioned in my question. – Rubz Mar 12 '18 at 07:17

Calculating sum and export to a file , based on the random number of inputs of a column

File1

Desired Output:

5 Answers5