Pandas Sum all rows in specific columne

Question

I have a CSV with an input like this:

Name     hours     Date
User1    2,5        01.01.2017
User1    5          02.01.2017
...
User1    3,5        31.01.2017

And now I want an output like this:

 Total:     
  User1   11        01-31.01.2017

But I get this 2,553,5
I cant say why I tested so many examples but always the same output.

print(df["hours"].sum())

is only one `User1` or multiple like `User2`, `User3` ... ? – jezrael Apr 10 '17 at 13:44 — jezrael, Apr 10 '17 at 13:44
Only one User. I filtered the other one out. – Kenny Apr 10 '17 at 13:45 — Kenny, Apr 10 '17 at 13:45

jezrael · Accepted Answer · 2017-04-10T14:07:28.403

2

You can use decimal="," parameter first in read_csv for remove , in floats.

cols = ['Name','Sum', 'Dates']
a = pd.DataFrame([[df["Name"].iat[0], 
                   df["hours"].sum(), 
                   df["Date"].min().strftime('%d.%m.%Y') + '-' + 
                   df["Date"].max().strftime('%d.%m.%Y')]], columns=cols)
print (a)
    Name   Sum                  Dates
0  User1  11.0  01.01.2017-01.02.2017

edited Apr 10 '17 at 14:07

answered Apr 10 '17 at 13:49

jezrael

822,522
95
1,334
1,252

Sadly its still gives me 2,553,5 and not a real sum. But Date works :) – Kenny Apr 10 '17 at 14:00
delimiter='\t', error_bad_lines=False, header=4, encoding='cp1252', comment='-', parse_dates=[3], converters={"hours": int}). – Kenny Apr 10 '17 at 14:01
add parameter `decimal` – jezrael Apr 10 '17 at 14:02
thats it! Thank you! But the date ist like this 2016-01-01 00:00:00-2016-12-31 00:00:00 how can i cut off the time 00:00:00 ? – Kenny Apr 10 '17 at 14:05

score 0 · Answer 2 · answered Apr 10 '17 at 14:08

The reason why is your decimal is using comma ','.

hours
2,5  
5  
3,5

When I try I got the same answer like you. After change it to decimal which is dot '.'. Like this

hours
2.5  
5  
3.5

Voila! You get the answer.

Below is the working example.

import pandas as pd

info = {'User':['User1','User1','User1'],
        'Value':[3.5,2.5,5],
        'Date':['01.01.2017','03.01.2017','02.01.2017']}
df = pd.DataFrame(info)
print(df['Value'].sum())

score 0 · Answer 3 · answered Apr 10 '17 at 14:17

First convert your hours column to a dtype of float64. This can be done with:

import pandas as pd
import numpy as np

df['hours'] = df.hours.apply(lambda x: x.replace(',', '.')).astype(np.float64)

Next you can use a function to return the desired df row.

def totals_by_user(df, username):
    # Sort by username
    df = df.loc[df.Name == username]

    # Get the total hours
    tot_hours = df.hours.sum()

    # Finds the date range in your format
    df = df.sort('Date')
    date_range = df.Date.min().split('.')[0] + '-' + df.Date.max()

    return pd.DataFrame([username, tot_hours, date_range], columns = ['Name', 'Sum', 'Date'])

score -1 · Answer 4 · answered Apr 10 '17 at 13:50

-1

try with this: so you can sum for example the second row with hours

   with open("nameofyourfile.csv") as fin:
        total = 0
        for row in csv.reader(fin):
            total += int(row[0]) # this will sum the first row                 
   print(total)

answered Apr 10 '17 at 13:50

kpl33

13
6

the problem is that the floats are using commas instead of points for the decimal, so python is interpreting them as strings and concating them rather than adding - I don't see how this would solve that. – Stael Apr 10 '17 at 13:55
Yes . you are right .maybe with regex it is possible to change the commas to points ? – kpl33 Apr 10 '17 at 14:15

Pandas Sum all rows in specific columne

4 Answers4