0

Here is a small sample of an xml file that I have

<?xml version="1.0" encoding="utf-8"?>
        <!DOCTYPE population SYSTEM "http://www.matsim.org/files/dtd/population_v5.dtd">

        <population>

        <!-- ====================================================================== -->

            <person id="10000061">
                <plan score="219.62581874242716" selected="yes">
                    <act type="home" link="21258" x="334867.243653" y="3126570.70778" start_time="03:00:00" end_time="15:07:00" />
                    <leg mode="ride" dep_time="15:07:00" trav_time="00:03:27" arr_time="15:10:27">
                        <route type="links">21258 14045 13977 13939 13925 13919 13905 13904</route>
                    </leg>
                    <act type="shop" link="13904" x="332634.86999" y="3127078.96383" start_time="15:12:00" end_time="16:21:00" />
                    <leg mode="car" dep_time="16:21:00" trav_time="00:09:44" arr_time="16:30:44">
                        <route type="links">13904 21207 21208 13980 21187 21188 14148 14144 14130 14129</route>
                    </leg>
                    <act type="shop" link="14129" x="331666.364904" y="3129306.48785" start_time="16:25:00" end_time="17:37:00" />
                    <leg mode="ride" dep_time="17:37:00" trav_time="00:09:46" arr_time="17:46:46">
                        <route type="links">14129 14143 14147 14161 14171 14189 14195 14120 14106 14051 13941 13938 13976 14044 21259 21258</route>
                    </leg>
                    <act type="home" link="21258" x="334867.243653" y="3126570.70778" start_time="17:45:00" end_time="26:59:00" />
                </plan>

                <plan score="218.9756035020247" selected="no">
                    <act type="home" link="21258" x="334867.243653" y="3126570.70778" start_time="03:00:00" end_time="15:07:00" />
                    <leg mode="ride" dep_time="15:07:00" trav_time="00:03:26" arr_time="15:10:26">
                        <route type="links">21258 14045 13977 13939 13925 13919 13905 13904</route>
                    </leg>
                    <act type="shop" link="13904" x="332634.86999" y="3127078.96383" start_time="15:12:00" end_time="16:21:00" />
                    <leg mode="car" dep_time="16:21:00" trav_time="00:08:46" arr_time="16:29:46">
                        <route type="links">13904 13905 13891 13855 21239 21240 13887 13885 13869 13870 13920 13974 14070 14075 14103 14109 14123 14129</route>
                    </leg>
                    <act type="shop" link="14129" x="331666.364904" y="3129306.48785" start_time="16:25:00" end_time="17:37:00" />
                    <leg mode="ride" dep_time="17:37:00" trav_time="00:11:06" arr_time="17:48:06">
                        <route type="links">14129 14143 14147 14161 14150 14098 14094 14095 14113 14106 14051 13941 13938 13976 14044 21259 21258</route>
                    </leg>
                    <act type="home" link="21258" x="334867.243653" y="3126570.70778" start_time="17:45:00" end_time="26:59:00" />
                </plan>

                <plan score="218.5148700010285" selected="no">
                    <act type="home" link="21258" x="334867.243653" y="3126570.70778" start_time="03:00:00" end_time="15:07:00" />
                    <leg mode="ride" dep_time="15:07:00" trav_time="00:03:26" arr_time="15:10:26">
                        <route type="links">21258 14045 13977 13939 13925 13919 13905 13904</route>
                    </leg>
                    <act type="shop" link="13904" x="332634.86999" y="3127078.96383" start_time="15:12:00" end_time="16:21:00" />
                    <leg mode="car" dep_time="16:21:00" trav_time="00:08:15" arr_time="16:29:15">
                        <route type="links">13904 13905 13906 13980 21187 21188 14148 14144 14130 14129</route>
                    </leg>
                    <act type="shop" link="14129" x="331666.364904" y="3129306.48785" start_time="16:25:00" end_time="17:37:00" />
                    <leg mode="ride" dep_time="17:37:00" trav_time="00:11:18" arr_time="17:48:18">
                        <route type="links">14129 14130 14124 14110 14104 14077 14071 13975 13921 13871 13868 13884 13886 13888 13894 13904 13918 13924 13938 13976 14044 21259 21258</route>
                    </leg>
                    <act type="home" link="21258" x="334867.243653" y="3126570.70778" start_time="17:45:00" end_time="26:59:00" />
                </plan>

            </person>

        <!-- ====================================================================== -->

            <person id="10000302">
                <plan score="209.66504470021556" selected="yes">
                    <act type="home" link="21256" x="334598.361546" y="3126269.05167" start_time="03:00:00" end_time="07:56:00" />
                    <leg mode="car" dep_time="07:56:00" trav_time="00:03:00" arr_time="07:59:00">
                        <route type="links">21256 13966 14056 14057</route>
                    </leg>
                    <act type="work" link="14057" x="335957.065395" y="3128105.16619" start_time="08:04:00" end_time="10:28:00" />
                    <leg mode="car" dep_time="10:28:00" trav_time="00:08:20" arr_time="10:36:20">
                        <route type="links">14057 14049 14045 13977 13939 13925 13919 21207 21208 13980 14046 14095 21191</route>
                    </leg>
                    <act type="social" link="21191" x="333032.807855" y="3128759.66141" start_time="10:33:00" end_time="11:52:00" />
                    <leg mode="car" dep_time="11:52:00" trav_time="00:08:33" arr_time="12:00:33">
                        <route type="links">21191 21194 14189 14195 14197 14210 14212 14234 14246 14215 14192 14178 14057 13967 21257 21256</route>
                    </leg>
                    <act type="home" link="21256" x="334598.361546" y="3126269.05167" start_time="11:59:00" end_time="12:11:00" />
                    <leg mode="car" dep_time="12:11:00" trav_time="00:06:35" arr_time="12:17:35">
                        <route type="links">21256 21257 21258 14045 13977 13939 13925 13919 13905 13906</route>
                    </leg>
                    <act type="social" link="13906" x="332302.159169" y="3127536.46778" start_time="12:17:00" end_time="13:30:00" />
                    <leg mode="car" dep_time="13:30:00" trav_time="00:05:30" arr_time="13:35:30">
                        <route type="links">13906 13907 13904 13918 13924 13938 13976 14044 21259 21256</route>
                    </leg>
                    <act type="home" link="21256" x="334598.361546" y="3126269.05167" start_time="13:36:00" end_time="26:59:00" />
                </plan>

                <plan score="205.5456839457717" selected="no">
                    <act type="home" link="21256" x="334598.361546" y="3126269.05167" start_time="03:00:00" end_time="07:56:00" />
                    <leg mode="car" dep_time="07:56:00" trav_time="00:02:15" arr_time="07:58:15">
                        <route type="links">21256 13966 14056 14057</route>
                    </leg>
                    <act type="work" link="14057" x="335957.065395" y="3128105.16619" start_time="08:04:00" end_time="10:28:00" />
                    <leg mode="car" dep_time="10:28:00" trav_time="00:06:51" arr_time="10:34:51">
                        <route type="links">14057 14056 14177 14191 14214 14247 14235 14213 14211 14198 14120 14114 21191</route>
                    </leg>
                    <act type="social" link="21191" x="333032.807855" y="3128759.66141" start_time="10:33:00" end_time="11:52:00" />
                    <leg mode="car" dep_time="11:52:00" trav_time="00:07:45" arr_time="11:59:45">
                        <route type="links">21191 21194 14189 14195 14197 14210 14212 14234 14246 14215 14192 14178 14057 13967 21257 21256</route>
                    </leg>
                    <act type="home" link="21256" x="334598.361546" y="3126269.05167" start_time="11:59:00" end_time="12:11:00" />
                    <leg mode="car" dep_time="12:11:00" trav_time="00:07:51" arr_time="12:18:51">
                        <route type="links">21256 13915 13823 13767 13743 13731 13732 13837 13831 13819 13820 13854 13890 13906</route>
                    </leg>
                    <act type="social" link="13906" x="332302.159169" y="3127536.46778" start_time="12:17:00" end_time="13:30:00" />
                    <leg mode="car" dep_time="13:30:00" trav_time="00:08:54" arr_time="13:38:54">
                        <route type="links">13906 13907 13904 13918 13924 13938 13976 14044 21259 21256</route>
                    </leg>
                    <act type="home" link="21256" x="334598.361546" y="3126269.05167" start_time="13:36:00" end_time="26:59:00" />
                </plan>

                <plan score="203.4205865037132" selected="no">
                    <act type="home" link="21256" x="334598.361546" y="3126269.05167" start_time="03:00:00" end_time="07:56:00" />
                    <leg mode="car" dep_time="07:56:00" trav_time="00:03:15" arr_time="07:59:15">
                        <route type="links">21256 13966 14056 14057</route>
                    </leg>
                    <act type="work" link="14057" x="335957.065395" y="3128105.16619" start_time="08:04:00" end_time="10:28:00" />
                    <leg mode="car" dep_time="10:28:00" trav_time="00:06:41" arr_time="10:34:41">
                        <route type="links">14057 14049 14045 13977 13939 13940 14050 14105 14114 21191</route>
                    </leg>
                    <act type="social" link="21191" x="333032.807855" y="3128759.66141" start_time="10:33:00" end_time="11:52:00" />
                    <leg mode="car" dep_time="11:52:00" trav_time="00:09:12" arr_time="12:01:12">
                        <route type="links">21191 21194 14189 14195 14197 14210 14212 14234 14246 14215 14192 14178 14057 13967 21257 21256</route>
                    </leg>
                    <act type="home" link="21256" x="334598.361546" y="3126269.05167" start_time="11:59:00" end_time="12:11:00" />
                    <leg mode="car" dep_time="12:11:00" trav_time="00:05:10" arr_time="12:16:10">
                        <route type="links">21256 13966 14049 14045 13977 13939 13925 13919 13905 13906</route>
                    </leg>
                    <act type="social" link="13906" x="332302.159169" y="3127536.46778" start_time="12:17:00" end_time="13:30:00" />
                    <leg mode="car" dep_time="13:30:00" trav_time="00:05:30" arr_time="13:35:30">
                        <route type="links">13906 13907 13904 13918 13924 13938 13976 14044 21259 21256</route>
                    </leg>
                    <act type="home" link="21256" x="334598.361546" y="3126269.05167" start_time="13:36:00" end_time="26:59:00" />
                </plan>

            </person>

        <!-- ====================================================================== -->

        </population>

I need to transform this to the following table-like structure

<?xml version="1.0" encoding="UTF-8"?>
1,10000061,home,21258,334867.243653,3126570.70778,03:00:00,15:07:00,ride,15:07:00,00:03:27,15:10:27
2,10000061,shop,13904,332634.86999,3127078.96383,15:12:00,16:21:00,car,16:21:00,00:09:44,16:30:44
3,10000061,shop,14129,331666.364904,3129306.48785,16:25:00,17:37:00,ride,17:37:00,00:09:46,17:46:46
4,10000061,home,21258,334867.243653,3126570.70778,17:45:00,26:59:00,,,,  
5,10000302,home,21256,334598.361546,3126269.05167,03:00:00,07:56:00,car,07:56:00,00:03:00,07:59:00
6,10000302,work,14057,335957.065395,3128105.16619,08:04:00,10:28:00,car,10:28:00,00:08:20,10:36:20
7,10000302,social,21191,333032.807855,3128759.66141,10:33:00,11:52:00,car,11:52:00,00:08:33,12:00:33
8,10000302,home,21256,334598.361546,3126269.05167,11:59:00,12:11:00,car,12:11:00,00:06:35,12:17:35
9,10000302,social,13906,332302.159169,3127536.46778,12:17:00,13:30:00,car,13:30:00,00:05:30,13:35:30
10,10000302,home,21256,334598.361546,3126269.05167,13:36:00,26:59:00,,,,

Currently, I am using an XSLT stylesheet shown below which gives me a different format. Can you please help me fix this style sheet to get the above format?

<xsl:stylesheet version="3.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs"
  expand-text="yes">
  <xsl:template name="xsl:initial-template">
    <xsl:stream href="100.plans.xml">
      <xsl:for-each select="/population/person | /population/person/plan[@selected='yes']/act | /population/person/plan[@selected='yes']/leg">
        {position()},{@id},{@type},{@link},{@x},{@y},{@start_time},{@end_time},{@mode},{@dep_time},{@trav_time},{@arr_time}
      </xsl:for-each>
    </xsl:stream>
  </xsl:template>
</xsl:stylesheet>
dataanalyst
  • 316
  • 3
  • 12
  • As for trying to process that file using the streaming features of XSLT 3.0, do you need to process very large files that you can't process without using streaming? If so, what causes the large size, lots of `person` elements, where the size of a individual `person` or individual `plan` element is rather small? That would allow to stream over the `person` or `plan` elements, but switch to normal processing using `copy-of()` for the children. – Martin Honnen Apr 27 '16 at 15:54
  • Yes. The file is large. This is why I tried to use the streaming feature of XSLT. Can you let me know the changes needed if I want to read it without the streaming mode? – dataanalyst Apr 27 '16 at 16:13
  • Well, Tim gave you an answer which does not use any streaming at all. If you want to use streaming but can live with pulling each `population/person/plan[@selected = 'yes']` into memory then I will try to post an example later. – Martin Honnen Apr 27 '16 at 17:14
  • Yes, you are right. But, when I try to run his code, I am receiving some errors. I think this is because his stylesheet does not refer to the input xml data file. In my code, this is accomplished using ``. I am trying to run this code from the command line using this `java -cp "C:\xml parse\saxon9ee.jar" net.sf.saxon.Transform -it -xsl:domactplans.xsl -o:domactfrmplans.txt`. I am sure this is a basic issue but I was unable to figure out how to get rid of the streaming feature but still reference the input xml file. – dataanalyst Apr 27 '16 at 17:21
  • I have posted an answer that uses streaming but allows the sibling access by making use of the `snapshot` function. As for Tim's solution, you could use `` instead of the ``. – Martin Honnen Apr 27 '16 at 18:03
  • can you provide any feedback whether the proposed answer helped? – Martin Honnen Apr 28 '16 at 20:12
  • I am sorry. Got busy with other stuff. Will update in a couple of hours. – dataanalyst Apr 28 '16 at 21:48

2 Answers2

2

Here is a streaming sample that only pulls each /population/person/plan[@selected='yes'] and its ancestors into memory using snapshot() as the sibling access in a pure streaming way is not possible I think:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="3.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    expand-text="yes">

    <xsl:output method="text"/>

    <xsl:template name="xsl:initial-template">
        <xsl:stream href="100.plans.xml">
            <xsl:for-each select="/population/person/plan[@selected='yes']/snapshot()/act">
                <xsl:variable name="leg" select="following-sibling::*[1][self::leg]" />
                <xsl:value-of select="position(), ../../@id, @type, @link, @x, @y, @start_time, @end_time, string($leg/@mode), string($leg/@dep_time), string($leg/@trav_time), string($leg/@arr_time)" separator=","/>
                <xsl:text>&#10;</xsl:text>
            </xsl:for-each>
        </xsl:stream>
    </xsl:template>

</xsl:stylesheet>

Output for your sample input with Saxon 9.7 EE is

1,10000061,home,21258,334867.243653,3126570.70778,03:00:00,15:07:00,ride,15:07:00,00:03:27,15:10:27
2,10000061,shop,13904,332634.86999,3127078.96383,15:12:00,16:21:00,car,16:21:00,00:09:44,16:30:44
3,10000061,shop,14129,331666.364904,3129306.48785,16:25:00,17:37:00,ride,17:37:00,00:09:46,17:46:46
4,10000061,home,21258,334867.243653,3126570.70778,17:45:00,26:59:00,,,,
5,10000302,home,21256,334598.361546,3126269.05167,03:00:00,07:56:00,car,07:56:00,00:03:00,07:59:00
6,10000302,work,14057,335957.065395,3128105.16619,08:04:00,10:28:00,car,10:28:00,00:08:20,10:36:20
7,10000302,social,21191,333032.807855,3128759.66141,10:33:00,11:52:00,car,11:52:00,00:08:33,12:00:33
8,10000302,home,21256,334598.361546,3126269.05167,11:59:00,12:11:00,car,12:11:00,00:06:35,12:17:35
9,10000302,social,13906,332302.159169,3127536.46778,12:17:00,13:30:00,car,13:30:00,00:05:30,13:35:30
10,10000302,home,21256,334598.361546,3126269.05167,13:36:00,26:59:00,,,,
Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
1

It looks like you actually want to output a line for each act element, but that line will include information from the ancestor person and following leg nodes too. This means you the xsl:for-each should really be like this:

<xsl:for-each select="/population/person/plan[@selected='yes']/act">

To get the the id of the person element you would then do this...

{../../@id}

To get information from the following leg element you could define a variable like so...

 <xsl:variable name="leg" select="following-sibling::*[1][self::leg]" />

And then you to get the information from the leg, you would do this

{$leg/@mode}

I don't have an XSLT 3.0 processor to hand to test myself, but in XSLT 1.0, it would look like this (I've not included all fields to get it shorter)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" />
  <xsl:template match="/">
      <xsl:for-each select="/population/person/plan[@selected='yes']/act">
        <xsl:variable name="leg" select="following-sibling::*[1][self::leg]" />
        <xsl:value-of select="position()" />
        <xsl:text>,</xsl:text>
        <xsl:value-of select="../../@id" />
        <xsl:text>,</xsl:text>
        <xsl:value-of select="@type" />
        <xsl:text>,</xsl:text>
        <xsl:value-of select="$leg/@mode" />
        <xsl:text>&#10;</xsl:text>
      </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

EDIT: If it is indeed the case that all act elements have following leg elements except the last one, try this XSLT instead

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" />
  <xsl:template match="/">
      <xsl:for-each select="/population/person/plan[@selected='yes']/*[self::act|self::leg]">
        <xsl:choose>
          <xsl:when test="self::act">
            <xsl:value-of select="position()" />
            <xsl:text>,</xsl:text>
            <xsl:value-of select="../../@id" />
            <xsl:text>,</xsl:text>
            <xsl:value-of select="@type" />
            <xsl:text>,</xsl:text>
            <xsl:value-of select="@link" />
          </xsl:when>
          <xsl:when test="self::leg">
            <xsl:text>,</xsl:text>
            <xsl:value-of select="@mode" />
            <xsl:text>,</xsl:text>
            <xsl:value-of select="@dep_time" />
            <xsl:text>&#10;</xsl:text>
          </xsl:when>
        </xsl:choose>
      </xsl:for-each>
      <xsl:text>,,&#10;</xsl:text>
  </xsl:template>
</xsl:stylesheet>
Tim C
  • 70,053
  • 14
  • 74
  • 93
  • I tried to implemionent your solution in XSLT 3.0 using SAXON. However, I am receiving this error. `Static error in xsl:stream/@href on line 8 column 38 of readactplans.xsl: XTSE3430: The body of the xsl:stream instruction is not streamable * Cannot use the following-sibling axis when context posture is striding (lin e 10) * Predicate at line 10 is not motionless` The error occurs when I try to print the `` attributes using `` – dataanalyst Apr 21 '16 at 11:29
  • Ah. Not being familiar with `xsl:stream`, I did not realise you can't use `following-sibling` with it. I'll probably have to delete my answer as it is not going to work in this case. Before I do, looking at you XML, will all `act` elements have a following `leg` element, with the exception of the very last `act` in a `plan`? – Tim C Apr 21 '16 at 11:57
  • Yes. That is indeed the case. All of the `act` tags have succeeding `leg` tags with the exception of the last one. Maybe you can keep the answer without deleting it. It certainly solves part of the issue (i.e. getting the .person id for each corresponding `act` lines). – dataanalyst Apr 21 '16 at 12:32
  • I've given an alternate solution in my answer, which doesn't use `following-sibling`. Even if this doesn't work, I will make sure I don't delete my answer now. Thanks! – Tim C Apr 21 '16 at 12:40
  • This gives me the error. `Static error in xsl:stream/@href on line 8 column 38 of readactplans.xsl: XTSE3430: The body of the xsl:stream instruction is not streamable * There is more than one potentially consuming operand: {self::act} and {self ::leg}, both on line 9 * Predicate at line 9 is not motionless`. I would like to point that, I am using `` instead of ``. I tested it by replacing the `template` element with your version but this seems to result in the same error. – dataanalyst Apr 21 '16 at 13:02