CSV Data is Being Read Out of Order

Question

Reading a csv file with Python27 from an Excel 16 csv file using: import csv with open("C:\Users\RJ\FG\Line\Line List.csv") as csv_input: reader = csv.DictReader(csv_input) for row in reader: print(row)

produces all the correct data, but the first column in the Excel file is 'RoutingFrom', but here it is the last.

{'RoutingTo': 'AMINE DRAIN ', 'Item': '1', 'LineSectTag': 'AD-12-1-011-0', 'LDTDocNo': 'M6D-1P12-00009', 'RoutingFrom': '1AM-12010-0 '}

When I open the csv file with a text editor: Item,RoutingFrom,RoutingTo,LDTDocNo,LineSectTag where it is the second column.

The test editor view was the original order the file had. I will be using this to add edges in Networkx so I reordered the columns in the spreadsheet so the 'from' and 'to' where the first two columns. Excel displays them as the first two columns, Pyhton reads them as the first and last, the text editor shows them as the second and third.

I then took the rearranged (from, to, item, ...) csv file and copied as text onto a new spreadsheet which is where all of the above comes from.

Any suggestions on how to get a consistently ordered dataset?

BTW, I am working with a small subset of the actual data which is 10 times as wide and 50 times as long.

I appreciate all input, Thanks Ray

If you use a dictionary, you will not preserve the order of the columns, because the order of keys in a dict are not ordered things. See http://stackoverflow.com/a/1885353. — Bonlenfum, Mar 13 '17 at 11:34

score 0 · Answer 1 · answered Mar 13 '17 at 08:21

0

Pandas and its DataFrame object are great for this. The CSV can be easily loaded and columns can be sliced from the DF to build a NetworkX Graph.

import pandas as pd
import networkx as nx

df = pd.read_csv("C:\Users\RJ\FG\Line\Line List.csv")
edges = zip(df.RoutingFrom, df.RoutingTo)

G = nx.Graph()
G.add_edges_from(edges)

answered Mar 13 '17 at 08:21

harryscholes

1,617
16
18

Thanks, easy to implement. The ordering is lost. The rows are all mixed up and the half of the to's / form's are interchanged. How can these be controlled? – Ray Joseph Mar 13 '17 at 11:21
This is a directed graph so I need to maintain pair ordering, and I have a lot of edge data to bring in also; so either all the data needs to come in at once or the row sequencing must be maintained. If I change the graph constructor to DiGraph, there is no change. df has the data structured correctly as does df.RoutingFrom and df.RoutingTo. How can I get the From/To ordering maintained and pull in the remaining edge data? – Ray Joseph Mar 13 '17 at 11:42
If you want a directed graph, change to `nx.DiGraph()`. When adding edges to `DiGraph`, the first node is the source node ('from') and the second node is the sink node ('to'). Can you clarify what you mean by "The rows are all mixed up and the half of the to's / form's are interchanged"? – harryscholes Mar 13 '17 at 12:16
I have changed the graph to DiGraph. The form's and to's match. The rows are out of order: – Ray Joseph Mar 14 '17 at 00:52
Output from df output from G.edges() Notes RoutingFrom RoutingTo Item 0 1AM-12010-0 AMINE DRAIN 1 [('1AM-12254-0 ', '1AD-12822-0 '), row 108 1 1AM-12052-0 AMINE DRAIN 2 ('1AD-12115-0 ', 'AMINE DRAIN '), row 27 2 1PSV-120206A 1AD-12070-1 3 ('1AD-12082-0 ', '1AD-12693-0 '), row 82 3 1AD-12070-0 AMINE DRAIN 4 ('1AM-12002-0 ', '1AD-12121-0 '), row 29 – Ray Joseph Mar 14 '17 at 00:56
OK, that did not work. That was supposed to show that the df object held the rows in the original order. But G.edges() produced the pairs out of order; the sequence of G.edges was 108, 27, 82, 29 ... I think this may be a problem mostly because it is out of order. Additionally, I don't see how to bring in node data which is currently in the original csv file. – Ray Joseph Mar 14 '17 at 01:01
I see. `G.edges()` does not retain ordering. To get an alphabetically/numerically ordered list of edges, call `sorted(G.edges())`. As for bringing in node data, I often create a dictionary mapping nodes to attributes then add these to the graph using `nx.set_node_attributes(G, 'attribute_name', {k:v for (k,v) in zip(sorted(G.nodes()), attribute_list)})`. If you would like a more detailed answer, please edit your question to include every point you need answered. – harryscholes Mar 14 '17 at 07:24

CSV Data is Being Read Out of Order

1 Answers1