1

I have a Koalas / Pandas-on-Spark dataframe named df.

When I try the function below I get a TypeError: str object is not callable

df[~(df.time.eq('00:00:00').groupby(df.vehicle_id).transform('sum')>=2)]

When I check the datatypes of both columns I get:

print(df.time.dtype)
<U0
print(df.vehicle_id.dtype)
<U0

Is that something that might have to deal with it?

sampeterson
  • 459
  • 4
  • 16
  • Please provice a [mcve], including a small sample dataframe and the full traceback. – timgeb Jun 22 '22 at 14:03
  • 2
    There's a function called `sum` -- are you sure you want `transform('sum')` and not `transform(sum)`? – Samwise Jun 22 '22 at 14:04
  • 1
    @Samwise that should be fine, `transform` accepts a number of functions in string form, such as `'sum'`, `'mean'`, etc. – timgeb Jun 22 '22 at 14:07
  • I used`transform('sum')` (with single quotes) on the same df but then as a pandas df without any problems before. Using `transform('sum')` on a Koalas df gives me a `TypeError: str object is not callable`. However, if I use `transform(sum)` (without single quotes) on a Koalas df, I get a different error: `TypeError: Transform function invalid for data types`. Any ideas? – sampeterson Jun 22 '22 at 14:13
  • How do you sum strings? – ifly6 Jun 22 '22 at 14:16
  • I sum the number of appearances grouped by `vehicle_id`. So if the value '00:00:00' in the `time` column appears more often than 2 times grouped per `vehicle_id`, i want to delete the whole group. [See this question](https://stackoverflow.com/questions/69991015/remove-group-from-the-pandas-dataframe-when-a-specific-value-within-the-group-oc) – sampeterson Jun 22 '22 at 14:18

0 Answers0