2

I have a CSV with an input like this:

Name     hours     Date
User1    2,5        01.01.2017
User1    5          02.01.2017
...
User1    3,5        31.01.2017

And now I want an output like this:

 Total:     
  User1   11        01-31.01.2017  

But I get this 2,553,5
I cant say why I tested so many examples but always the same output.

print(df["hours"].sum())
Kenny
  • 88
  • 3
  • 14

4 Answers4

2

You can use decimal="," parameter first in read_csv for remove , in floats.

cols = ['Name','Sum', 'Dates']
a = pd.DataFrame([[df["Name"].iat[0], 
                   df["hours"].sum(), 
                   df["Date"].min().strftime('%d.%m.%Y') + '-' + 
                   df["Date"].max().strftime('%d.%m.%Y')]], columns=cols)
print (a)
    Name   Sum                  Dates
0  User1  11.0  01.01.2017-01.02.2017
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

The reason why is your decimal is using comma ','.

hours
2,5  
5  
3,5 

When I try I got the same answer like you. After change it to decimal which is dot '.'. Like this

hours
2.5  
5  
3.5 

Voila! You get the answer.

Below is the working example.

import pandas as pd

info = {'User':['User1','User1','User1'],
        'Value':[3.5,2.5,5],
        'Date':['01.01.2017','03.01.2017','02.01.2017']}
df = pd.DataFrame(info)
print(df['Value'].sum())
Ariff Yasri
  • 174
  • 1
  • 11
0

First convert your hours column to a dtype of float64. This can be done with:

import pandas as pd
import numpy as np

df['hours'] = df.hours.apply(lambda x: x.replace(',', '.')).astype(np.float64)

Next you can use a function to return the desired df row.

def totals_by_user(df, username):
    # Sort by username
    df = df.loc[df.Name == username]

    # Get the total hours
    tot_hours = df.hours.sum()

    # Finds the date range in your format
    df = df.sort('Date')
    date_range = df.Date.min().split('.')[0] + '-' + df.Date.max()

    return pd.DataFrame([username, tot_hours, date_range], columns = ['Name', 'Sum', 'Date'])
Mike C
  • 21
  • 1
-1

try with this: so you can sum for example the second row with hours

   with open("nameofyourfile.csv") as fin:
        total = 0
        for row in csv.reader(fin):
            total += int(row[0]) # this will sum the first row                 
   print(total)
kpl33
  • 13
  • 6
  • the problem is that the floats are using commas instead of points for the decimal, so python is interpreting them as strings and concating them rather than adding - I don't see how this would solve that. – Stael Apr 10 '17 at 13:55
  • Yes . you are right .maybe with regex it is possible to change the commas to points ? – kpl33 Apr 10 '17 at 14:15