pandas read_csv remove blank rows

Question

I am reading in a CSV file as a DataFrame while defining each column's data type. This code gives an error if the CSV file has a blank row in it. How do I read the CSV without blank rows?

dtype = {'material_id': object, 'location_id' : object, 'time_period_id' : int, 'demand' : int, 'sales_branch' : object, 'demand_type' : object }

df = pd.read_csv('./demand.csv', dtype = dtype)

I thought of one workaround of doing something like this but not sure if this is the efficient way:

df=pd.read_csv('demand.csv')
df=df.dropna()

and then redefining the column data types in the df.

Edit : Code -

import pandas as pd
dtype1 = {'material_id': object, 'location_id' : object, 'time_period_id' : int, 'demand' : int, 'sales_branch' : object, 'demand_type' : object }
df = pd.read_csv('./demand.csv', dtype = dtype1)
df

Error - ValueError: Integer column has NA values in column 2

My CSV file's snapshot -

Could you provide the error? An example that I have tried in this dataset: owner,car Pedro,Honda Antonio,Hyunday When I execute this code: cars_df = pd.read_csv('cars.csv') This code works despite there is a blank row: owner car 0 Pedro Honda 1 Antonio Hyunday — Antonio Andrés, Sep 02 '18 at 09:20
Please include a snippet of your `demand.csv` with some blank lines, to reproduce the issue and make this a [MCVE (Minimum Complete Verifiable Example)](https://stackoverflow.com/help/mcve). Otherwise, noone can reproduce your issue. You might like to use multiline strings: `"""A B C\n1 2 3.4\n..."""` and read using StringIO. — smci, Sep 02 '18 at 23:37
Please check your _whole file_ that all rows that got any data in it also have an integer value in time_period (column 2). The Error states that you got a NA value in column 2 - so probably there are some non-int characters in column 2 _somewhere_. Also: open CSV in notepad++ or smth alike - NOT Excel/OpenOffice/LibreOffice. That way you can see the delimiters as well. There might be rows that only consist of `,,,,,` which would be a filled row but with all empty values. — Patrick Artner, Sep 03 '18 at 07:54
`pandas` by default populates the empty row as `NaN`. Could that be causing the problem? — Karvy1, Sep 03 '18 at 07:59
Is there a workaround to handle `,,,,,`? (apart from manually deleting the row from the csv file) — Karvy1, Sep 03 '18 at 08:03
None of the posted solutions worked for me, while this worked: https://www.kite.com/python/answers/how-to-drop-empty-rows-from-a-pandas-dataframe-in-python — DavideL, Dec 21 '21 at 21:53

score 7 · Answer 1 · answered Jul 20 '19 at 20:10

7

This worked for me.

def delete_empty_rows(file_path, new_file_path):
    data = pd.read_csv(file_path, skip_blank_lines=True)
    data.dropna(how="all", inplace=True)
    data.to_csv(new_file_path, header=True)

answered Jul 20 '19 at 20:10

NadavWeisler

89
1
3

1

This works. The key aspect here is `how="all"` in the `dropna()` call - which eliminates only those rows that are fully-empty (not even a single column has a value). – nonbeing May 12 '21 at 05:43

score 2 · Answer 2 · answered Sep 02 '18 at 09:12

2

try smth like this:

data = pd.read_table(filenames,skip_blank_lines=True, a_filter=True)

answered Sep 02 '18 at 09:12

fuwiak

721
1
8
25

4

skip_blank_lines is True by default in read_csv. Explicitly specifying skip_blank_lines = True did not help. – Karvy1 Sep 02 '18 at 14:57
2

`TypeError: parser_f() got an unexpected keyword argument 'a_filter'` – Oleg Melnikov Nov 15 '19 at 02:35
Oleg, you have to have different version of pandas. – fuwiak Nov 15 '19 at 11:17

score 0 · Answer 3 · edited Apr 07 '23 at 01:38

0

df = pd.read_csv('./demand.csv', dtype = dtype).dropna(how='all')

worked fine

edited Apr 07 '23 at 01:38

moken

3,227
8
13
23

answered Mar 31 '23 at 08:30

Chirag

13
5

3

Answer needs supporting information Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the [help center](https://stackoverflow.com/help/how-to-answer). – moken Apr 07 '23 at 01:37

score 0 · Answer 4 · answered Apr 13 '23 at 01:47

In this you have to specify .dropna(how='all')

When you do how='all' then a row which has only Nan gets eliminated. If you don't do how = 'all' then any row, which has any column value as NaN, will also get eliminated and you will be left with a much smaller dataframe

score -1 · Answer 5 · answered Jun 29 '20 at 15:46

-1

solution could be :

data = pd.read_table(filenames,skip_blank_lines=True, na_filter=True)

answered Jun 29 '20 at 15:46

asmatrk

237
2
9

score -4 · Answer 6 · edited Apr 15 '23 at 12:38

I am not sure whether its efficient or not but it works. This code does not load nan values while reading a csv.

df = pd.read_csv('demand.csv').dropna()

In this you have to specify .dropna(how = 'all').

When you do how = 'all' then a row which has only Nan gets eliminated. If you don't do how = 'all' then any row, which has any column value as NaN, will also get eliminated and you will be left with a much smaller dataframe.

pandas read_csv remove blank rows

6 Answers6

Linked

Related