Data Example:
<?xml version='1.0' encoding='UTF-8'?><osm version="0.6" generator="osmconvert 0.7P" timestamp="2013-07-20T19:00:02Z">
.
<way id="128725988" version="1" timestamp="2011-09-03T08:06:56Z" changeset="9198624" uid="42429" user="42429">
<nd ref="1421727256"/>
<nd ref="1421727264"/>
<nd ref="1421727238"/>
<nd ref="1421727237"/>
<nd ref="1421727256"/>
<tag k="addr:housenumber" v="43"/>
<tag k="addr:street" v="Wilhelm-Ahrens-Straße"/>
<tag k="building" v="yes"/>
</way>
.
.
<node id="1964468590" lat="53.068416" lon="8.779039" version="1" timestamp="2012-10-14T12:29:02Z" changeset="13491909" uid="715371" user="cracklinrain"/>
<node id="1964468593" lat="53.0684177" lon="8.7798644" version="1" timestamp="2012-10-14T12:29:02Z" changeset="13491909" uid="715371" user="cracklinrain">
<tag k="natural" v="tree"/>
</node>
.
.
.
<way id="128725989" version="1" timestamp="2011-09-03T08:06:57Z" changeset="9198624" uid="42429" user="42429">
<nd ref="1421728028"/>
<nd ref="1421728023"/>
<nd ref="1421728016"/>
<nd ref="1421728024"/>
<nd ref="1421728028"/>
<tag k="addr:housenumber" v="44"/>
<tag k="addr:street" v="Alma-Rogge-Straße"/>
<tag k="building" v="yes"/>
</way>
.
.
This is an example of a Xml File with an amount of 30GB data inside.
What I want to do is to get only the <tag>
elements which contains specific wanted atributes like addr:housenumber
.
One thing which is needed to keep connected is the id
from the parent element.
My main problem is how to handle a 30 GB document. If it were about a few hundred MB it would be no problem to solve it by myself.
What I already tried:
XmlReader
Works very well for getting specific attributes but the connection to the parent id is lost.
Things like xDocument, XmlDocument...
Problem is the amount of Data. (30 GB)
After loading ~ 1GB into memory get anOutOfMemoryException
.
I understand it would be crazy to load an amount of 30GB into memory.
I am already having a separate working solution by using a OpenSource Library for pbf files (but I want to process the clean data) and extracting the needed data by iterating through every node and using LinqToSql for adding it to the database.
Final result:
I want to import every street, housenumber, postalcode and city into a SQL Server database where StreetTable
is connecty with CityTable
(my first solution is working well but after an amount of 10 000 processed items it becomes very slow.)
I hope it is understandable what I want to do.