4

I am looking to select the first two elements of each row in column a and column b.

Here is an example

df = pd.DataFrame({'a': ['A123', 'A567','A100'], 'b': ['A156', 'A266666','A35555']})

>>> df
      a        b
0  A123     A156
1  A567  A266666
2  A100   A35555

desired output

>>> df
      a      b
0     A1     A1
1     A5     A2
2     A1     A3

I have been trying to use df.loc but not been successful.

ayhan
  • 70,170
  • 20
  • 182
  • 203
SBad
  • 1,245
  • 5
  • 23
  • 36
  • 2
    Possible duplicate of [Select One Element in Each Row of a Numpy Array by Column Indices](https://stackoverflow.com/questions/17074422/select-one-element-in-each-row-of-a-numpy-array-by-column-indices) – Pirate X Mar 28 '18 at 09:07

2 Answers2

6

Use

In [905]: df.apply(lambda x: x.str[:2])
Out[905]:
    a   b
0  A1  A1
1  A5  A2
2  A1  A3

Or,

In [908]: df.applymap(lambda x: x[:2])
Out[908]:
    a   b
0  A1  A1
1  A5  A2
2  A1  A3
Zero
  • 74,117
  • 18
  • 147
  • 154
  • thanks for that it works well for me. if I decide to apply the same thing but only to one column (say column a). How can we do that? I have tried df.apply(lambda x: x['a'].str[:2]) and df['a'].apply(lambda x: x.str[:2]) but it is not working – SBad Mar 28 '18 at 09:11
  • 1
    use `df['a'].apply(lambda x: x[:2])` – Sociopath Mar 28 '18 at 09:15
  • @Akshay Thank you for your answer which worked fine for my data unti i got this error TypeError: 'float' object has no attribute '__getitem__' . Aftr investigation I found that the error is due to missing values in column a (some rows are empty) this is due to imperfect dataset. How can I tackle that error and tell python to ignore the error and get on with it? – SBad Mar 28 '18 at 11:13
  • 1
    You can write your own function to handle NaN and pass it to apply. But as error suggests I think one of your column is Float, try converting it into string and then apply. – Sociopath Mar 28 '18 at 11:56
  • the error does suggest that my column is not a string but it is definitely one (i have just double checked). The issue is coming from missing values because when I run the code and ignore those row with null/missing values i get no error – SBad Mar 28 '18 at 12:09
  • Try this to handle Nulls `lambda x: np.nan if np.isnan(x) else x[:2]` – Sociopath Mar 28 '18 at 12:16
1
In [107]: df.apply(lambda c: c.str.slice(stop=2))
Out[107]:
    a   b
0  A1  A1
1  A5  A2
2  A1  A3
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419