0

I have a csv which I have imported using the following script

import pandas as pd
data = pd.read_csv("filen__new.csv")
data.head()

I wish to know the type of each column (ie whether its numerical, boolean or any other type) When I do data.info() I get

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200009 entries, 0 to 200008
Columns: 151 entries, iterator to parse168
dtypes: float64(149), int64(1), object(1)

Is there a better way to get the type of each of the columns along with column name: for eg something that indicates my column 1 name is iterator and its type as numeric/float & name of column2 is parse1 and its type is boolean

I also was looking to generate unique and max values of each of columns and use the code for each of my column names

## for unique values for each column
uniqueValues = data['iterator'].unique()
print('Unique elements in column "iterator" ')
print(uniqueValues)
###printing the max value of column1 
column1 = data["iterator"]
max_value = column1.max()
print(max_value)

If i have 150 columns I have to repeat this 150 times. is there a way i could do this better?

user2359877
  • 45
  • 1
  • 7

2 Answers2

0

For datatype do df.dtypes and it will show you all the columns types.

For all unique values in a column you can use df.nunique() to see how many there are and

for col in df:
    print(df[col].unique())

to print it out.

To find all max values you can use

for col in df:
    print(df[col].max())

to print out.

Renee
  • 90
  • 2
  • 10
0

As stated in the docs, pd.DataFrame.info() truncates by default at pd.options.display.max_info_columns. This behavior could be changed with the argument max_cols. The default value for this is 100, as could be seen with:

print(pd.options.display.max_info_columns)
100

For a Minimal Reproducible Example (MRE)

import pandas as pd


n_columns = 150
df1 = pd.DataFrame(columns=[x for x in range(n_columns)], 
               data=[[x for x in range(n_columns)],
                     [x*100 for x in range(n_columns)]
                     ])
df1.info(max_cols=n_columns)

# You could access the types through
df1.dtypes

# You could access the column names through
df1.columns

# You could access the column max values as
df1.max()

A pd.DataFrame does not possess the unique method, only a pd.Series. So in order to avoid for loops, you could apply it to a pd.DataFrame:

df1.apply(lambda x: x.unique())

For new questions, perhaps you could follow this good practice guide MRE and for Pandas.

At last, the Pandas documentation is very well explained, you could consult it for more details and several better examples.

viniciusrf1992
  • 313
  • 1
  • 7