9

I am working on a project where I imported data from SQL into a pandas DataFrame. This seems to go swimmingly, but when I take the pandas.mean() it throws a TypeError saying that a concatenated list of the values cannot be converted to numeric (see below):

Example Dataframe:

df =
  ProductSKU OverallHeight-ToptoBottom
0   AAI2185                      74.5
1   AAI2275                        47
2   AAI2686                      56.5
3  AASA1002                     73.23

Function Call:

avgValue = df["OverallHeight-ToptoBottom"].dropna().mean()             <--- Breaks here

Console Output:

    Traceback (most recent call last):

  File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\generic.py", line 5310, in stat_func
    numeric_only=numeric_only)

  ... 

  File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\nanops.py", line 293, in nanmean
    the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))

  File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\nanops.py", line 743, in _ensure_numeric
    raise TypeError('Could not convert %s to numeric' % str(x))

TypeError: Could not convert 74.54756.573.23 to numeric

The strangest thing (and what I cannot figure out), is that it works perfectly fine when I import the same data through a CSV. It only breaks when I load it through SQL, could there be something I did incorrectly there?

cottontail
  • 10,268
  • 18
  • 50
  • 51
Tom Mori
  • 131
  • 1
  • 1
  • 3
  • Can you put a a code that reproduce the error ? Your code is working...ouput {'Average': 61.763999999999996, 'LowerBound': 16.28734181143033, 'MissingData': 0, 'OutsideRange': 0, 'StDev': 15.158886062856554, 'UpperBound': 107.24065818856965} – Tbaki Jun 13 '17 at 13:24
  • 2
    Check `df['OverallHeight-ToptoBottom'].dtype`. I suspect that column contains strings (which might show up as type `object`), not floating point values. – Warren Weckesser Jun 13 '17 at 13:34
  • 1
    @WarrenWeckesser - Bingo. You hit the nail on the head with the dtype. I fixed it using pd.to_numeric() to fix it. Thanks! – Tom Mori Jun 13 '17 at 14:25

3 Answers3

4

If you got a similar TypeError after a groupby operation (e.g. TypeError: Could not convert ace to numeric), then you probably have pandas>=2.0.

groupby.mean() has numeric_only= argument whose default value was True in the past but since pandas 2.0, its default value is False. An implication is that string columns are not dropped when a statistical method such as mean or std is called on the groupby object (as was done in the past). To solve the issue, pass numeric_only=True.

An example that shows the problem and the solution.

import pandas as pd
df = pd.DataFrame({
    "Grouper": ["A", "B", "A", "B", "A"],
    "Name": ["a", "b", "c", "d", "e"],
    "Value": [0.95, 0.25, 0.25, 0.10, 1.00]
})

grouped = df.groupby("Grouper").mean()                    # <---- TypeError: Could not convert ace to numeric

grouped = df.groupby("Grouper").mean(numeric_only=True)   # <---- OK
cottontail
  • 10,268
  • 18
  • 50
  • 51
1

As the console output reveals, there's a problem with the dataframe column 'OverallHeight-ToptoBottom'.

My guess, as @Warren Weckesser commented, is that the columns contains strings. In order to check the data types of that column run

print(df['OverallHeight-ToptoBottom'].dtype) 

Assuming that the above is true, converting the column data type to float should solve the problem. For that use pandas.to_numeric

df["OverallHeight-ToptoBottom"] = pd.to_numeric(df["OverallHeight-ToptoBottom"], downcast="float")
Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83
-1

Chances are ToptoBottom's contents are being read as a string. If you are opening a file with pythons read mode, and splitting the string based on new lines and commas the numbers are converted to string, this is why panda cant read the data or use it as a mean. Thus you get the error:

TypeError: Could not convert 74.54756.573.23 to numeric

to remove the conversion issue, you want to convert it to numeric values, you can use pandas to_numeric() function