-1

I need to remove "tei:" from each tag. My original xml looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<?oxygenRNGSchema="http://www.teic.org/release/xml/tei/custom/schema/relaxng/tei_all.rn"type="xml"?>
<?xml-stylesheet type="text/xsl" href="jerome-html-proof.xsl"?>
<TEI
  xmlns="http://www.tei-c.org/ns/1.0"
  xmlns:tei="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>Chronicles (Latin working edition, based on Helm)</title>
        <author>Jerome</author>
      </titleStmt>
      <publicationStmt>
        <p>Unpublished</p>
      </publicationStmt>
      <sourceDesc>
        <p>PD online text from http://www.tertullian.org/fathers/index.htm#jeromechronicle, entitled
          "Jerome, Chronicle (2005)" and based on pages of Helm's edition indicated in milestone
          elements. </p>
        <p>Source page includes note, "This text was transcribed by JMB. All material on this page
          is in the public domain - copy freely." </p>
      </sourceDesc>
    </fileDesc>
  </teiHeader>
  <text>
    <body>
      <div
        n="preface"
        type="prefatory"> </div>
<table>    
<row role="header">
            <cell ana="abraham"/>
            <cell ana="assyrians">Regnum Assyriorum</cell>
            <cell ana="sacred-history"/>
            <cell ana="hebrews"> Hebraeorum gentis exordium</cell>
            <cell ana="sicyonians"> Regnum Sicyoniorum</cell>
            <cell ana="gentile-history"/>
            <cell ana="egyptians"> Regnum Aegyptiorum</cell>
            <cell ana="adbc"> BC</cell>
</row>   
<row role="regnal">
            <cell/>
            <cell/>
            <cell/>
            <cell/>
            <cell>Sicyoniorum III, TELCHIN, annis XX.</cell>
</row>
<row>
            <cell/>
            <cell>15</cell>
            <cell/>
            <cell>25</cell>
            <cell>1</cell>
            <cell/>
            <cell>25</cell>
            <cell>1992</cell>
</row>
</table>
</body>
</text>
</TEI>

When I run my script, I get the same output but with "tei:" in each tag:

<tei:TEI> 
<tei:text> 
<tei:body> 
<tei:div>
<tei:row role="header">...........

I'm trying to add a value to each row that is not used as a header and does not mark a change in ruler. My code is:

    import groovy.xml.StreamingMarkupBuilder
    import groovy.xml.XmlUtil

    def TEI = new XmlSlurper().parse(new File('file.xml'))
    def jeromeRow = new File("file-row.xml")
    def x = 0 


    for (row in TEI.text.body.div.table.row) {
    if (row.@role != 'regnal' && row.@role != 'header'){
    x = x + 1
    row.@n = 'r' + x 
    }
    }

def outputBuilder = new StreamingMarkupBuilder()
String result = outputBuilder.bind{ mkp.yield TEI }
jeromeRow << XmlUtil.serialize(result)

How do I prevent my script from making this unwanted change to my xml file.

mpk
  • 1
  • 2
  • Can you paste actual input and output? There's no role attribute or n attribute in what you've shown – tim_yates Jan 23 '16 at 19:20
  • The actual input and output is almost 35,000 lines. I hope this gives you a better idea of what I'm looking at. Thanks. – mpk Jan 23 '16 at 20:12
  • that input will not give a `tei:` prefix you're seeing... do you have a better example? Maybe one that exhibits the problem you describe when run through the code in the question? (the code in the question will do nothing at present, as `TEI.text.body.div.table.row` contains nothing due to the `table` clause) – tim_yates Jan 24 '16 at 00:42

2 Answers2

0

Your code looks correct except for the non existing 'table'. When I run the following in groovyConsole, it looks just fine:

import groovy.xml.StreamingMarkupBuilder
import groovy.xml.XmlUtil
def xmlText = """<TEI> 
<text> 
<body> 
<div>
<row role="header">
            <cell ana="abraham"/>
            <cell ana="assyrians">Regnum Assyriorum</cell>
            <cell ana="sacred-history"/>
            <cell ana="hebrews"> Hebraeorum gentis exordium</cell>
            <cell ana="sicyonians"> Regnum Sicyoniorum</cell>
            <cell ana="gentile-history"/>
            <cell ana="egyptians"> Regnum Aegyptiorum</cell>
            <cell ana="adbc"> BC</cell>
</row>   
<row role="regnal">
            <cell/>
            <cell/>
            <cell/>
            <cell/>
            <cell>Sicyoniorum III, TELCHIN, annis XX.</cell>
</row>
<row>
            <cell/>
            <cell>15</cell>
            <cell/>
            <cell>25</cell>
            <cell>1</cell>
            <cell/>
            <cell>25</cell>
            <cell>1992</cell>
</row>
</div>
</body>
</text>
</TEI>"""

def TEI = new XmlSlurper().parseText(xmlText)
def x=1
for (row in TEI.text.body.div.row) {
    if (row.@role != 'regnal' && row.@role != 'header'){
      row.@n = 'r' + x++
    }
}
def outputBuilder = new StreamingMarkupBuilder()
String result = outputBuilder.bind{ mkp.yield TEI }

println XmlUtil.serialize(result)

Looking at your code again, I see that you at the end APPEND data to the end of the file.

jeromeRow << XmlUtil.serialize(result)

Could it be that you for some reason (in code not submitted) are appending empty data into an already incorrect file?

Jocce Nilsson
  • 1,658
  • 14
  • 28
  • As I commented above, there's something missing from this question – tim_yates Jan 24 '16 at 09:45
  • Tim is right. I did leave out the element. I was just trying to avoid putting thousands of lines in the question and that one slipped by me. I ran your script Joachim and I still came up with the same problem. Every element tag has "tei:" added to it. But I'll add the headings of my xml file in an edit.
    – mpk Jan 24 '16 at 17:14
  • @mpk that is interesting, in my versions it does not. So I assume it is a version problem. My versions: "Groovy Version: 2.4.5 JVM: 1.8.0_65 Vendor: Oracle Corporation OS: Linux" . I am running the code in groovyConsole – Jocce Nilsson Jan 24 '16 at 17:26
  • I'm using 2.4.5 but running the code in my terminal on OSX. I tried to run the code from groovyConsole, moved my files around, but couldn't get the code to find the files. I'll play with it some more. Thanks for the help. – mpk Jan 24 '16 at 18:33
0

If you change

def TEI = new XmlSlurper().parse(new File('file.xml'))

to

def TEI = new XmlSlurper(false, false).parse(new File('file.xml'))

It turns off validation and namespace handling in the slurper and you should get the expected result

tim_yates
  • 167,322
  • 27
  • 342
  • 338