0

Trying to optimize a procedure that parses GML strings to Oracle sdo_geometry.

<gml:Polygon >
   <gml:exterior>
        <gml:LinearRing>
            <gml:posList > [coordinates] </gml:posList>
        </gml:LinearRing>
    </gml:exterior>
   <gml:interior>
        <gml:Ring>
             <gml:curveMember>
                <gml:Curve>
                    <gml:segments>
                        <gml:LineStringSegment>
                            <gml:posList > [coordinates] </gml:posList>
                        </gml:LineStringSegment>
                        <gml:Arc>
                            <gml:posList > [coordinates] </gml:posList>
                        </gml:Arc>
                        <gml:LineStringSegment>
                            <gml:posList > ... </gml:posList>
                        </gml:LineStringSegment>
                    </gml:segments>
                </gml:Curve>
            </gml:curveMember>
        </gml:Ring>
    </gml:interior>
   <gml:interior>
        <gml:LinearRing>
            <gml:posList > [coordinates] </gml:posList>
        </gml:LinearRing>
    </gml:interior> 
</gml:Polygon>

The polygon consists of 1 exterior and 0-many interior elements.

Our current solution works as follows.

First it extracts all exterior and interior elements and processes those separately

 cursor c_exterior(p_xml xmltype) is
         select t.*
           from xmltable('//exterior' passing p_xml columns exterior xmltype path '/') as t;

      cursor c_interior(p_xml xmltype) is
         select t.*
           from xmltable('//interior' passing p_xml columns interior xmltype path '/') as t;

   begin

      -- process each exterior/interior ring.   
      for r_exterior in c_exterior(p_xml)
      loop
         process(r_exterior.exterior, [other params]);
      end loop;

      for r_interior in c_interior(p_xml)
      loop
         process(r_interior.interior, [other params]);
      end loop;

The processing of the exterior and interior element is as follows:

   select t.*
   from xmltable('for $d in //node() where exists($d/posList) return $d' passing p_xml columns
                            poslist clob path './posList'
                            ,parent varchar2(100) path 'name()') as t;

This gets the list of coordinates and the name of the immediate parent of the poslist element.

Question: Since this is a two step process it may not be optimized for speed. I'm looking for ways to get the information I need from the GML in 1 query. But I can't figure out how to do that. Mainly because of the varying level into the XML of the poslist element and generating the number of the exterior/interior ring.

Information I need:

  1. exterior or interior.
  2. Number of ring. So I know the coordinates are part of the same exterior/interior ring.
  3. The [coordinates]
  4. Name of immediate parent of the poslist element.

So from above example:

exterior, 1, gml:LinearRing, [coordinates]
interior, 2, gml:LineStringSegment, [coordinates]
interior, 2, gml:Arc, [coordinates]
interior, 2, gml:LineStringSegment, [coordinates]
interior, 3, gml:LinearRing, [coordinates]
Rene
  • 10,391
  • 5
  • 33
  • 46

1 Answers1

2

You can use chained XMLTable calls:

select x1.type, x1.ring_num, x2.parent, x2.coordinates
from xmltable(
  xmlnamespaces(default 'http://www.opengis.net/gml/'),
  '/Polygon/*'
  passing p_xml
  columns type varchar2(8) path './local-name()',
    ring_num for ordinality,
    nodes xmltype path '//posList/..'
) x1
cross join xmltable (
  xmlnamespaces(default 'http://www.opengis.net/gml/'),
  '/*'
  passing x1.nodes
  columns parent varchar2(21) path './name()',
    coordinates varchar2(30) path 'posList'
) x2;

(adjusting output data types and sizes as needed for your real data, of course)

The x1 table gets the exterior/interior nodes, adds an ordinal column for the 'ring number', and includes an XMLType column that has all the nodes that are parents of a posList node. That is then passed into x2 which extracts the multiple posList coordinates and parents.

Demo using a CTE to provide the XML document instead of PL/SQL:

with t (p_xml) as (
  select xmltype('<gml:Polygon xmlns:gml="http://www.opengis.net/gml/">
   <gml:exterior>
        <gml:LinearRing>
            <gml:posList > [coordinates] </gml:posList>
        </gml:LinearRing>
    </gml:exterior>
   <gml:interior>
        <gml:Ring>
             <gml:curveMember>
                <gml:Curve>
                    <gml:segments>
                        <gml:LineStringSegment>
                            <gml:posList > [coordinates] </gml:posList>
                        </gml:LineStringSegment>
                        <gml:Arc>
                            <gml:posList > [coordinates] </gml:posList>
                        </gml:Arc>
                        <gml:LineStringSegment>
                            <gml:posList > ... </gml:posList>
                        </gml:LineStringSegment>
                    </gml:segments>
                </gml:Curve>
            </gml:curveMember>
        </gml:Ring>
    </gml:interior>
   <gml:interior>
        <gml:LinearRing>
            <gml:posList > [coordinates] </gml:posList>
        </gml:LinearRing>
    </gml:interior> 
</gml:Polygon>') from dual
)
select x1.type, x1.ring_num, x2.parent, x2.coordinates
from t
cross join xmltable(
  xmlnamespaces(default 'http://www.opengis.net/gml/'),
  '/Polygon/*'
  passing t.p_xml
  columns type varchar2(8) path './local-name()',
    ring_num for ordinality,
    nodes xmltype path '//posList/..'
) x1
cross join xmltable (
  xmlnamespaces(default 'http://www.opengis.net/gml/'),
  '/*'
  passing x1.nodes
  columns parent varchar2(21) path './name()',
    coordinates varchar2(30) path 'posList'
) x2;

which gets:

TYPE       RING_NUM PARENT                COORDINATES                   
-------- ---------- --------------------- ------------------------------
exterior          1 gml:LinearRing         [coordinates]                
interior          2 gml:LineStringSegment  [coordinates]                
interior          2 gml:Arc                [coordinates]                
interior          2 gml:LineStringSegment  ...                          
interior          3 gml:LinearRing         [coordinates]                

Switching between name() and local-name() lets you include or omit the namespace from the value.


You could also do this with a single XMLTable, but getting the 'ring number' is a bit of a pain; here I'm injecting an attribute into the exterior/interior node - which may well be slower than the chained XMLTable approach, but possibly worth testing on your data:

select x.type, x.ring_num, x.parent, x.coordinates
from xmltable(
  xmlnamespaces(default 'http://www.opengis.net/gml/'),
  'copy $i := /Polygon
    modify
      for $j in $i/(exterior | interior)
        return (insert node attribute pos { count($j/preceding-sibling::*) + 1 } into $j)
    return $i//posList'
  passing t.p_xml
  columns type varchar2(8) path 'local-name(./ancestor::exterior | ./ancestor::interior)',
    ring_num number path '(./ancestor::exterior/@pos | ./ancestor::interior/@pos)',
    parent varchar2(21) path '../name()',
    coordinates varchar2(30) path '.'
) x;

With a CTE for sample data again that gets the same results. I'm not sure if there's another way to get - effectively - the iteration count in the FLWOR loop.

Alex Poole
  • 183,384
  • 11
  • 179
  • 318
  • Thanks, this takes about 2/3 of the time compared to the original solution. With large amounts of data to process quite a significant gain. – Rene Feb 26 '20 at 10:24
  • 1
    @Rene - just for fun I've added a way to do it with a single XMLTable, but I suspect it might not improve performance. Could be worth trying it out though. (And your bottleneck might be somewhere else, not in the XML handling, I suppose.) – Alex Poole Feb 26 '20 at 16:22
  • Plsql profiling tells me that xml processing is 65% of total time at the moment. (17% is sdo_geom.validate). Alternative to generating a ring num is comparing the coordinates. As a ring closes on itself, once a coordinate pair equals the first pair you have reached the end of a ring and a new ring can be started. But I guess savings might be marginal. – Rene Feb 27 '20 at 13:54
  • 65% with either single or chained XMLTable calls? Just curious. If you can get the ring number some other way then the second version can use a simple XPath of `'//posList'` instead of needing to modify, which might be a bit quicker. But yes, savings may well be very marginal. (And could you potentially have two rings that both happen to have the same start/end points?) – Alex Poole Feb 27 '20 at 14:01
  • 1
    I'll get back to this at a later date when there's time for more experimenting. Acceptible perormance for now. Have to move on. Thanks for the help. – Rene Feb 27 '20 at 14:22