0

I primarily work in R, but am trying to switch over to python to use some of the new tools in the osmnx package and have become stuck on getting an .graphml file created by igraph in R to be read in properly by osmnx in python.

tl;dr: the main question is if it is possible to manually specify the "node id" values when using the write_graph() function in R from the package igraph?

Longer version: I am taking an existing .graphml file "accra-1910.graphml" (downloadable from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KA5HJ3, within the "ghana-GHA_graphml.zip" file).

I'm reading this .graphml file into R using read_graph() from the igraph package, subsetting to a subgraph of interest, saving a new .graphml file of the subgraph using write_graph() from the igraph package, and then trying to read the resulting new .graphml file back into osmnx in python using osmnx.load_graphml().

So in R, this looks like:

gml <- read_graph("accra-1910.graphml", format="graphml")
sub.t1 <- subgraph(gml, V(gml)[id %in% osmids.t1])
#note that osmids.t1 is a vector of vertex/node ids to subset to that i've created earlier in the R script
write_graph(sub.t1, "accra_pre1991.graphml", format="graphml")

Then in python, when I try to read "accra_pre1991.graphml" using:

G = ox.load_graphml(filepath="accra_pre1991.graphml")

I always get this error message:

node_id = self.node_type(node_xml.get("id"))
ValueError: invalid literal for int() with base 10: 'n0'

In checking the raw .graphml objects, I can see why this happens. The original file "accra-1910.graphml" has nodes indexed in this format:

    <node id="30729912">
      <data key="d4">5.5784766</data>
      <data key="d5">-0.1648661</data>
      <data key="d6">3</data>
      <data key="d7">51</data>
      <data key="d8">51</data>
      <data key="d9">51</data>
    </node>
    <node id="30729918">
      <data key="d4">5.5821678</data>
      <data key="d5">-0.1666711</data>
      <data key="d6">3</data>
      <data key="d7">40</data>
      <data key="d8">40</data>
      <data key="d9">46</data>
    </node>

Note that the node_ids take character values such as "30729912", etc. However, the .graphml file produced by write_graph() from igraph in R, always take this format instead:

 <node id="n0">
      <data key="v_highway"></data>
      <data key="v_elevation_srtm">51</data>
      <data key="v_elevation_aster">51</data>
      <data key="v_elevation">51</data>
      <data key="v_street_count">3</data>
      <data key="v_x">-0.1648661</data>
      <data key="v_y">5.5784766</data>
      <data key="v_id">30729912</data>
    </node>
    <node id="n1">
      <data key="v_highway"></data>
      <data key="v_elevation_srtm">46</data>
      <data key="v_elevation_aster">40</data>
      <data key="v_elevation">40</data>
      <data key="v_street_count">3</data>
      <data key="v_x">-0.1666711</data>
      <data key="v_y">5.5821678</data>
      <data key="v_id">30729918</data>
    </node>

Here, igraph has automatically created a new index of node ids "n0", "n1", "n2", and so on, and is storing the old ids as data attributes for the node, as in <data key="v_id">30729918</data>. osmnx in python then can't parse this new id formatting and is getting stuck on the very first node id value (thus the error: "invalid literal for int() with base 10: 'n0'" in which it can't handle a non-numeric character in the node id values)

Is there any way to get igraph (via write_graph() in R) to export a .graphml file in the same formatting as the original .graphml file instead?

Apologies if this question isn't clear -- this is my first time posting on Stack Overflow and I'm not sure what all the rules are. Thanks for any help you can provide!

jylls
  • 4,395
  • 2
  • 10
  • 21
  • Note that this is a problem with osmnx, not igraph. The GraphML specification does allows `id`s to be any "NMTOKEN". http://graphml.graphdrawing.org/specification/xsd.html#element-node There is not indication that this is expected to be an integer. – Szabolcs Feb 10 '22 at 08:29
  • Unfortunately, there is no way to set specific IDs to be used when exporting GraphML from igraph. The IDs in GraphML files are not meant to carry information. Their purpose is solely to be able to distinguish nodes. Therefore, it wouldn't make sense to allow users to set specific IDs, even if this were technically easy (it is not): what if they set non-unique IDs? My suggestion is to report this problem to the osmnx project. osmnx's requirement for IDs to be integers appears to be a bug. – Szabolcs Feb 10 '22 at 09:04
  • Thanks -- this is helpful! In the meantime, I've been able to figure out a workaround by creating the subgraph directly in python without having to export the .graphml in a way that changes its original formatting. – noahnathan5 Feb 10 '22 at 15:09

0 Answers0