Remove leading zeros when reading a csv file

Question

I have a CSV file that looks something like this -

    Location ID      Location Name
        3543459      A
         20541       B
          C320       C
           ...       ..

When I read the file using pd.read_csv, I get something like this -

Location ID      Location Name
   03543459      A
   0020541       B
   000C320       C
       ...       ..

How to avoid leading zeros? I did some research, all the questions I could ifind were based on producing the leading zeros in the df.

If you make `Location ID` an integer column, they'll disappear automatically. Is `Location ID` supposed to be integer or string (or Categorical)? — smci, Aug 03 '18 at 07:35

score 10 · Accepted Answer · answered Aug 03 '18 at 07:30

10

Use post processing by str.lstrip:

df['Location ID'] = df['Location ID'].str.lstrip('0')

answered Aug 03 '18 at 07:30

jezrael

score 2 · Answer 2 · answered Aug 30 '20 at 04:01

2

I had mixed type below line worked ..

 df['col'] = df['col'].apply(lambda x:x.lstrip('0') if type(x) == str else x)

answered Aug 30 '20 at 04:01

Hietsh Kumar

score 1 · Answer 3 · answered Aug 03 '18 at 07:34

1

df['Location ID'] = df['Location ID'].apply(lambda x: x.lstrip('0'))

answered Aug 03 '18 at 07:34

U13-Forward

johnDanger · Answer 4 · 2022-01-21T01:34:57.810

0

For anyone with more complex strings (e.g., 'AB00003423'), you can use Series.str.extract() and a regular expression:

extractedNumbers = df.ID_col.str.extract('^[A-Z]+0+([0-9]+)$')

This will return a column of whatever is inside the parentheses (or "capture group(s)") of the regular expression.

Normally a dataframe is returned with 1 column per capture group, use expand=False to return a Series instead.

edited Jan 21 '22 at 01:34

answered Jan 21 '22 at 01:29

johnDanger

4 Answers4