0

I've been importing CSVs using pandas, but I keep getting a random extra line every time I try to use it and it causes errors in my code. How do I completely erase this line?

The code I used to import it was: import itertools import copy import networkx as nx import pandas as pd import matplotlib.pyplot as plt import csv

df3=pd.read_csv(r"U:\\user\edge_list_4.csv")
print(df3)

df4=pd.read_csv(r"U:\\user\nodes_fixed_2.csv")
df4.dropna() 
print(df4)


g=nx.Graph()

for i,elrow in df3.iterrows():
    g.add_edge(elrow[0], elrow[1], **elrow[2:].to_dict())


# Add node attributes
for i, nlrow in df4.iterrows():
# g.node[nlrow['id']] = nlrow[1:].to_dict()  # deprecated after NX 1.11
nx.set_node_attributes(g, {nlrow['ID']:  nlrow[1:].to_dict()}) 

# Node list example
print(nlrow)

# Preview first 5 edges

list(g.edges(data=True))[0:5] 

# Preview first 10 nodes

list(g.nodes(data=True))[0:10] 

print('# of edges: {}'.format(g.number_of_edges()))
print('# of nodes: {}'.format(g.number_of_nodes()))

# Define node positions data structure (dict) for plotting
for node in g.nodes(data=True):
print(node)
print("")
node_positions = {node[0]: (node[1]['X'], -node[1]['Y']) for node in 
g.nodes(data=True)}

My table is a simple ID, X ,Y table. I've tried using the:

drop.na() 

code, but couldn't seem to take it away. I've tried editing it on Notepad++ and import it as a txt file, but it still keeps appearing. Is there any way I should specifically edit the csv file on excel or is there a code I can use?

('rep1', {'X': 1, 'Y': 1811})

('rep2', {'X': 2, 'Y': 1811})

('rep3', {'X': 3, 'Y': 1135})

('rep4', {'X': 4, 'Y': 420})

('rep5', {'X': 5, 'Y': 885})

('rep6', {'X': 6, 'Y': 1010})

('rep7', {'X': 7, 'Y': 1010})

('rep8', {'X': 8, 'Y': 1135})

('rep9', {'X': 9, 'Y': 1135})

('rep10', {'X': 10, 'Y': 885})

('rep1 ', {})

The line is only meant to the rep 10.

KeyError: 'X'
Kaori21
  • 43
  • 11

4 Answers4

0

Try using error_bad_lines option while reading csv file. Hope it should work.

df_csv = pd.read_csv(FILENAME.csv, error_bad_lines=False)

If you always wanted to ignore last line try skipfooter

df_csv = pd.read_csv(FILENAME.csv, skipfooter = 1)

Number of lines at bottom of file to skip (Unsupported with engine=’c’). Documentation

LOrD_ARaGOrN
  • 3,884
  • 3
  • 27
  • 49
0

Basically you receive a parsing error, because csv lines have some data missing.

Generally, the best way to address this problem would be read a file tolerating missing values. For this, your code should filter lines with missing values.

if 'X' not in line:
    # skip the line

Skipping one line is not a perfect solution, it is a data format knowledge that should't be stored in code. Instead of reading an arbitrary .csv file your code will only read a particular kind of files.

Sergei Voitovich
  • 2,804
  • 3
  • 25
  • 33
0

You could try to select the column valid elements this way: drop[bool(drop.<column_name>[1]) == True]. I use the bool cast on the 2nd element of the set, because an empty dict casted to bool is False.

However, it would be better, as akhetos said, to show us more of your code and also your source CSV file.

JLD
  • 89
  • 5
-1

please read about skipfooter - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv

df_csv = pd.read_csv(FILENAME.csv, skipfooter = 1)
Madhur Yadav
  • 635
  • 1
  • 11
  • 30
  • Hi, I've used skipped footer but it also deleted the rep 10, which is something I need as well – Kaori21 Jul 30 '19 at 09:08
  • @ElizabethDC - i'm running the code and it's working fine, please add headers=None as a parameter or add your own column names. maybe you are confused because the first value is becoming the header. – Madhur Yadav Jul 30 '19 at 09:16
  • this is what I get when I enter @MadhurYadav df4=pd.read_csv(r"U:\\user\test_node.csv",skipfooter = 1): ID X Y 0 rep1 1 1811 1 rep2 2 1811 2 rep3 3 1135 3 rep4 4 420 4 rep5 5 885 5 rep6 6 1010 6 rep7 7 1010 7 rep8 8 1135 8 rep9 9 1135 – Kaori21 Jul 30 '19 at 09:24