9

I am reading in a CSV file as a DataFrame while defining each column's data type. This code gives an error if the CSV file has a blank row in it. How do I read the CSV without blank rows?

dtype = {'material_id': object, 'location_id' : object, 'time_period_id' : int, 'demand' : int, 'sales_branch' : object, 'demand_type' : object }

df = pd.read_csv('./demand.csv', dtype = dtype)

I thought of one workaround of doing something like this but not sure if this is the efficient way:

df=pd.read_csv('demand.csv')
df=df.dropna()

and then redefining the column data types in the df.

Edit : Code -

import pandas as pd
dtype1 = {'material_id': object, 'location_id' : object, 'time_period_id' : int, 'demand' : int, 'sales_branch' : object, 'demand_type' : object }
df = pd.read_csv('./demand.csv', dtype = dtype1)
df

Error - ValueError: Integer column has NA values in column 2

My CSV file's snapshot - enter image description here

Karvy1
  • 959
  • 6
  • 14
  • 25
  • 1
    Could you provide the error? An example that I have tried in this dataset: owner,car Pedro,Honda Antonio,Hyunday When I execute this code: cars_df = pd.read_csv('cars.csv') This code works despite there is a blank row: owner car 0 Pedro Honda 1 Antonio Hyunday – Antonio Andrés Sep 02 '18 at 09:20
  • Please include a snippet of your `demand.csv` with some blank lines, to reproduce the issue and make this a [MCVE (Minimum Complete Verifiable Example)](https://stackoverflow.com/help/mcve). Otherwise, noone can reproduce your issue. You might like to use multiline strings: `"""A B C\n1 2 3.4\n..."""` and read using StringIO. – smci Sep 02 '18 at 23:37
  • 1
    Please check your _whole file_ that all rows that got any data in it also have an integer value in time_period (column 2). The Error states that you got a NA value in column 2 - so probably there are some non-int characters in column 2 _somewhere_. Also: open CSV in notepad++ or smth alike - NOT Excel/OpenOffice/LibreOffice. That way you can see the delimiters as well. There might be rows that only consist of `,,,,,` which would be a filled row but with all empty values. – Patrick Artner Sep 03 '18 at 07:54
  • `pandas` by default populates the empty row as `NaN`. Could that be causing the problem? – Karvy1 Sep 03 '18 at 07:59
  • Yes, I see the empty row as `,,,,,` – Karvy1 Sep 03 '18 at 08:01
  • Is there a workaround to handle `,,,,,`? (apart from manually deleting the row from the csv file) – Karvy1 Sep 03 '18 at 08:03
  • None of the posted solutions worked for me, while this worked: https://www.kite.com/python/answers/how-to-drop-empty-rows-from-a-pandas-dataframe-in-python – DavideL Dec 21 '21 at 21:53

6 Answers6

7

This worked for me.

def delete_empty_rows(file_path, new_file_path):
    data = pd.read_csv(file_path, skip_blank_lines=True)
    data.dropna(how="all", inplace=True)
    data.to_csv(new_file_path, header=True)
NadavWeisler
  • 89
  • 1
  • 3
  • 1
    This works. The key aspect here is `how="all"` in the `dropna()` call - which eliminates only those rows that are fully-empty (not even a single column has a value). – nonbeing May 12 '21 at 05:43
2

try smth like this:

data = pd.read_table(filenames,skip_blank_lines=True, a_filter=True)
fuwiak
  • 721
  • 1
  • 8
  • 25
0
df = pd.read_csv('./demand.csv', dtype = dtype).dropna(how='all') 

worked fine

moken
  • 3,227
  • 8
  • 13
  • 23
Chirag
  • 13
  • 5
  • 3
    Answer needs supporting information Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the [help center](https://stackoverflow.com/help/how-to-answer). – moken Apr 07 '23 at 01:37
0

In this you have to specify .dropna(how='all')

When you do how='all' then a row which has only Nan gets eliminated. If you don't do how = 'all' then any row, which has any column value as NaN, will also get eliminated and you will be left with a much smaller dataframe

Chirag
  • 13
  • 5
-1

solution could be :

data = pd.read_table(filenames,skip_blank_lines=True, na_filter=True)
asmatrk
  • 237
  • 2
  • 9
-4

I am not sure whether its efficient or not but it works. This code does not load nan values while reading a csv.

df = pd.read_csv('demand.csv').dropna()

In this you have to specify .dropna(how = 'all').

When you do how = 'all' then a row which has only Nan gets eliminated. If you don't do how = 'all' then any row, which has any column value as NaN, will also get eliminated and you will be left with a much smaller dataframe.

Benjamin Buch
  • 4,752
  • 7
  • 28
  • 51
sajid
  • 35
  • 2