-2
import numpy as np
import pandas as pd
a = np.array([["M", 86],
              ["M", 76],
              ["M", 56],
              ["M", 66],
              ["B", 16],
              ["B", 13],
              ["B", 16],
              ["B", 18],
              ["B", 14], ])
df = pd.DataFrame(data=a, columns=["Case", "radius"])
print(df)

print(df.columns)
a = df[(df["radius"] >= 57) & (df["Case"] == "M")]["radius"].tolist()
print(a)

I get an error - TypeError: '>=' not supported between instances of 'str' and 'int' But here i am putting a condition on a column that contains integers. What is the problem here? i want to have a list of column radius values where the values of column radius are greater than or equal to 57 and "Case"=="M

  • 1
    Column type for radius is object. It will work when you convert it: df["radius"].astype(int) >= 57 – Arkadiusz May 18 '21 at 12:05
  • np.arrays are a single type. Since you start with a np.array the type of that array is object. If you had used a list the dtypes would've been correct when turning into a dataframe. You could also use `df = df.convert_dtypes()` to fix all the dtypes in the frame. – Henry Ecker May 18 '21 at 12:06
  • Start with `a` as list rather than array. And check the `df.dtypes` before trying to do the testing – hpaulj May 18 '21 at 15:12

1 Answers1

0

Typecast the radius column after creating the df, it should work:

df.radius = df.radius.astype(int)
Shivam Roy
  • 1,961
  • 3
  • 10
  • 23