0

I have a CSV

df = pd.read_csv('data.csv')

Table:

Column A Column B Column C
4068744 -1472525 2596219
198366 - -

The file is using '-' for nul values

I tried converting to int without handling that '-'.

My question is: how do I strip the string '-' without changing the negative values?

df['Column B'] = df['Column B'].astype(int)

ValueError: invalid literal for int() with base 10: '-'

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
TrueLies
  • 1
  • 1

1 Answers1

0

Higher version of pandas can hold integer dtypes with missing values. Normal int conversion doesn't support null values.

# replace - with null
df.replace('-', pd.NA, inplace=True)
# and use Int surrounding with ''
df['Column B'] = df['Column B'].astype('Int64')

output:

> df

  Column A  Column B Column C
0  4068744  -1472525  2596219
1   198366      <NA>     <NA>

> df['Column B'].info

Name: Column B, dtype: Int64>
JayPeerachai
  • 3,499
  • 3
  • 14
  • 29