71

I read a csv file into a pandas dataframe, and would like to convert the columns with binary answers from strings of yes/no to integers of 1/0. Below, I show one of such columns ("sampleDF" is the pandas dataframe).

In [13]: sampleDF.housing[0:10]
Out[13]:
0     no
1     no
2    yes
3     no
4     no
5     no
6     no
7     no
8    yes
9    yes
Name: housing, dtype: object

Help is much appreciated!

jpp
  • 159,742
  • 34
  • 281
  • 339
Mushu909
  • 1,194
  • 2
  • 11
  • 16

19 Answers19

123

method 1

sample.housing.eq('yes').mul(1)

method 2

pd.Series(np.where(sample.housing.values == 'yes', 1, 0),
          sample.index)

method 3

sample.housing.map(dict(yes=1, no=0))

method 4

pd.Series(map(lambda x: dict(yes=1, no=0)[x],
              sample.housing.values.tolist()), sample.index)

method 5

pd.Series(np.searchsorted(['no', 'yes'], sample.housing.values), sample.index)

All yield

0    0
1    0
2    1
3    0
4    0
5    0
6    0
7    0
8    1
9    1

timing
given sample

enter image description here

timing
long sample
sample = pd.DataFrame(dict(housing=np.random.choice(('yes', 'no'), size=100000)))

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • 7
    This is a great in depth answer. I wouldn't have even thought of some of these. – gold_cy Dec 01 '16 at 04:58
  • 1
    Can you do an entire dataframe full of yes and nos? I'm looking at this congressional vote dataset: http://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records – mLstudent33 Mar 13 '20 at 00:46
40

Try this:

sampleDF['housing'] = sampleDF['housing'].map({'yes': 1, 'no': 0})
gold_cy
  • 13,648
  • 3
  • 23
  • 45
16
# produces True/False
sampleDF['housing'] = sampleDF['housing'] == 'yes'

The above returns True/False values which are essentially 1/0, respectively. Booleans support sum functions, etc. If you really need it to be 1/0 values, you can use the following.

housing_map = {'yes': 1, 'no': 0}
sampleDF['housing'] = sampleDF['housing'].map(housing_map)
3novak
  • 2,506
  • 1
  • 17
  • 28
8
%timeit
sampleDF['housing'] = sampleDF['housing'].apply(lambda x: 0 if x=='no' else 1)

1.84 ms ± 56.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Replaces 'yes' with 1, 'no' with 0 for the df column specified.

SriramKRaju
  • 81
  • 1
  • 1
5

Use sklearn's LabelEncoder

from sklearn.preprocessing import LabelEncoder

lb = LabelEncoder() 
sampleDF['housing'] = lb.fit_transform(sampleDF['housing'])

Source

Freek Nortier
  • 780
  • 1
  • 13
  • 27
4

yes there is you can change yes/no values of your column to 1/0 by using following code snippet

sampleDF = sampleDF.replace(to_replace = ['yes','no'],value = ['1','0'])
sampleDF

by using first line you can replace the values with 1/0 by using second line you can see the changes by printing it

3

Generic way:

import pandas as pd
string_data = string_data.astype('category')
numbers_data = string_data.cat.codes

reference: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.astype.html

Siddaram H
  • 1,126
  • 12
  • 17
3

For a dataset names data and a column named Paid;

data = data.replace({'Paid': {'yes': 1, 'no': 0}})

all the yes will change to 1 and all the no will be replaced by 0

Neuron
  • 5,141
  • 5
  • 38
  • 59
muli
  • 47
  • 1
  • 3
1

You can convert a series from Boolean to integer explicitly:

sampleDF['housing'] = sampleDF['housing'].eq('yes').astype(int)
jpp
  • 159,742
  • 34
  • 281
  • 339
1

The easy way to do that use pandas as below:

housing = pd.get_dummies(sampleDF['housing'],drop_first=True)

after that drop this filed from main df

sampleDF.drop('housing',axis=1,inplace=True)

now merge new one in you df

sampleDF= pd.concat([sampleDF,housing ],axis=1)
Eslamspot
  • 77
  • 5
1

A simple and intuitive way to convert the whole dataframe to 0's and 1's might be:

sampleDF = sampleDF.replace(to_replace = "yes", value = 1)
sampleDF = sampleDF.replace(to_replace = "no", value = 0)
Josmy
  • 408
  • 3
  • 12
  • 1
    You can do this in a single line as well : `sampleDF = sampleDF.replace( to_replace = {"no" : 0, "yes" : 1})` – Shivam Shah Oct 18 '20 at 07:24
1
sampleDF['housing'] = sampleDF['housing'].map(lambda x: 1 if x == 'yes' else 0)
sampleDF['housing'] = sampleDF['housing'].astype(int)

This will work.

Benny06
  • 31
  • 4
1

Try this, it will work.

sampleDF.housing.replace(['no', 'yes'], [0,1], inplace = True) 
Nija I Pillai
  • 1,046
  • 11
  • 13
0

Try the following:

sampleDF['housing'] = sampleDF['housing'].str.lower().replace({'yes': 1, 'no': 0})
Sazzad
  • 21
  • 3
0

I used the preprocesssing function from sklearn. First you create an encoder.

e = preprocessing.LabelEncoder()

Then for each attribute or characteristic in the data use the label encoder to transform it into an integer value

size = le.fit_transform(list(data["size"]))
color = le.fit_transform(list(data["color"]))

It's converting a list of all the "size" or "color" attributes, and converting that into a list of their corresponding integer values. To put all of this into one list, use the zip function.

It is not going to be in the same format as the csv file; it will be a giant list of everything.

data = list(zip(buying, size))

Hopefully I explained that somewhat clearly.

Turkey
  • 1
  • 1
  • Isn't this a bit risky, like many uses of `string.replace()`, since it will replace the values **everywhere**? – AMC Feb 15 '20 at 00:51
  • Yeah, I found a better solution so I put that. I don't think I explained it very well though. – Turkey Feb 16 '20 at 16:26
0

You can also try :

sampleDF["housing"] = (sampleDF["housing"]=="Yes")*1 
0

This is just a bool to int.

Try this.

sampleDF.housing = (sampleDF.housing == 'yes').astype(int)

DeepBlue
  • 415
  • 4
  • 9
0

use pandas.Series.map

sampleDF.map({'yes':1,'no':0})

0

comprehension array

sampleDF['housing'] = [int(v == 'yes') for v in sampleDF['housing']]