select the first N elements of each row in a column

Question

I am looking to select the first two elements of each row in column a and column b.

Here is an example

df = pd.DataFrame({'a': ['A123', 'A567','A100'], 'b': ['A156', 'A266666','A35555']})

>>> df
      a        b
0  A123     A156
1  A567  A266666
2  A100   A35555

desired output

>>> df
      a      b
0     A1     A1
1     A5     A2
2     A1     A3

I have been trying to use df.loc but not been successful.

Possible duplicate of [Select One Element in Each Row of a Numpy Array by Column Indices](https://stackoverflow.com/questions/17074422/select-one-element-in-each-row-of-a-numpy-array-by-column-indices) — Pirate X, Mar 28 '18 at 09:07

score 6 · Accepted Answer · answered Mar 28 '18 at 09:06

6

Use

In [905]: df.apply(lambda x: x.str[:2])
Out[905]:
    a   b
0  A1  A1
1  A5  A2
2  A1  A3

Or,

In [908]: df.applymap(lambda x: x[:2])
Out[908]:
    a   b
0  A1  A1
1  A5  A2
2  A1  A3

answered Mar 28 '18 at 09:06

Zero

74,117
18
147
154

thanks for that it works well for me. if I decide to apply the same thing but only to one column (say column a). How can we do that? I have tried df.apply(lambda x: x['a'].str[:2]) and df['a'].apply(lambda x: x.str[:2]) but it is not working – SBad Mar 28 '18 at 09:11
1

use `df['a'].apply(lambda x: x[:2])` – Sociopath Mar 28 '18 at 09:15
@Akshay Thank you for your answer which worked fine for my data unti i got this error TypeError: 'float' object has no attribute '__getitem__' . Aftr investigation I found that the error is due to missing values in column a (some rows are empty) this is due to imperfect dataset. How can I tackle that error and tell python to ignore the error and get on with it? – SBad Mar 28 '18 at 11:13
1

You can write your own function to handle NaN and pass it to apply. But as error suggests I think one of your column is Float, try converting it into string and then apply. – Sociopath Mar 28 '18 at 11:56
the error does suggest that my column is not a string but it is definitely one (i have just double checked). The issue is coming from missing values because when I run the code and ignore those row with null/missing values i get no error – SBad Mar 28 '18 at 12:09
Try this to handle Nulls `lambda x: np.nan if np.isnan(x) else x[:2]` – Sociopath Mar 28 '18 at 12:16

score 1 · Answer 2 · answered Mar 28 '18 at 09:17

1

In [107]: df.apply(lambda c: c.str.slice(stop=2))
Out[107]:
    a   b
0  A1  A1
1  A5  A2
2  A1  A3

answered Mar 28 '18 at 09:17

MaxU - stand with Ukraine

205,989
36
386
419

select the first N elements of each row in a column

2 Answers2