Using Shell Script, find and update .xml file's tag values that was present for multiple times

Question

I was having a xml file that contains user-name and password for multiple times, and also connection-url which needs to be changed dynamically.

<datasources>
  <datasource jndi-name="java:jboss/datasources/TestFlow" pool-name="TestFlow" enabled="true" use-java-context="true" statistics-enabled="${wildfly.datasources.statistics-enabled:$ {wildfly.statistics-enabled:false}}">
    <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE</connection-url>
    <driver>h2</driver>
    <security>
      <user-name>test</user-name>
      <password>test</password>
    </security>
  </datasource>
  <datasource jta="false" jndi-name="java:/AdminDSource" poolname="AdminDSource" enabled="true" use-java-context="true">
    <connection-url>jdbc:oracle:thin:@xxxxxx.xxxxxxx.xxxxxxxx-1.rds.amazonaws.com:xxxx:ORCL</connection-url>
    <driver>oracle</driver>
    <security>
      <user-name>aldo</user-name>
      <password>aldo</password>
    </security>
  </datasource>
</datasources>

In the above I would like to change the first occurrence of connection-url, user-name and password to be replaced with some desired values

<connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE</connection-url>
<user-name>test</user-name>
<password>test</password>

to be changed to

<connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE</connection-url>
<user-name>Atom</user-name>
<password>Atom</password>

Similarly for the second occurrence of the same to be changed

<connection-url>jdbc:oracle:thin:@{Content after the @ to be changed}</connection-url>
<user-name>{aldo to username}</user-name>
<password>{aldo to password}</password>

I have tried the below to update user-name and password,

for filename in *.xml; do
    if grep -q '<driver>h2</driver>' "$filename"; then
            sed -i.bak 's/<user-name>test<\/user-name>/<user-name>Atom<\/user-name>/g'  "$filename"
            
    fi
    if grep -q '<driver>h2</driver>' "$filename"; then
            
            sed -i.bak 's/<password>test<\/password>/<password>Atom<\/password>/g' "$filename"
    fi
    if grep -q '<driver>oracle</driver>' "$filename"; then
            sed -i.bak 's/<user-name>aldo<\/user-name>/<user-name>username<\/user-name>/g' "$filename"
            
    fi
    if grep -q '<driver>oracle</driver>' "$filename"; then
            
            sed -i.bak 's/<password>aldo<\/password>/<password>password<\/password>/g' "$filename"
    fi
done

but I would like to have a single script that makes all the desirable changes.

What you posted **is** a single script, though not particularily efficient. BTW, is it guaranteed that in your XML, starting and ending tag will always be in the same physical line? — user1934428, Jun 18 '21 at 06:23
The fastest way to do this - unless you have 100s of xml fiiles - is to just open your favorite text editor and do search/replace... much faster than writing and debugging a complex script. — Thomas, Jun 18 '21 at 07:03

RobC · Answer 1 · 2021-06-18T12:50:18.347

This renowned Bash FAQ states the following:

Do not attempt [to update an XML file] with sed, awk, grep, and so on (it leads to undesired results)

Below are a couple of different solutions that utilize XML specific command line tools instead.

Using an XMLStarlet command

Consider utilizing the folowing XMLStarlet command:

xml ed -L -u "(//datasources/datasource)[1]/connection-url" -v "jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE" \
          -u "(//datasources/datasource)[1]/security/user-name" -v "Atom" \
          -u "(//datasources/datasource)[1]/security/password" -v "Atom" \
          -u "(//datasources/datasource)[2]/connection-url" -v "jdbc:oracle:thin:@{Content after the @ to be changed}" \
          -u "(//datasources/datasource)[2]/security/user-name" -v "{aldo to username}" \
          -u "(//datasources/datasource)[2]/security/password" -v "{aldo to username}" \
          ./some/path/to/file.xml

_{Note: You'll need to redefine the trailing ./some/path/to/file.xml path as necessary}

Explanation:

The parts of the aforementioned command breakdown as follows:

xml - invoke the XML Starlet command.
ed - Edit/Update the XML document.
-L - Edit file inplace (Note: You may want to initially omit this while testing)
-u - Update <xpath> followed by -v the replacement <value>.

Let's the look at the XPath patterns used to match the nodes:

(//datasources/datasource)[1]/connection-url - this matches the connection-url element node that is a child of the first datasources/datasource element node.
(//datasources/datasource)[1]/security/user-name - this matches the user-name element node whose parent element node is security, and security must be a child of the first datasources/datasource xml element node.
(//datasources/datasource)[1]/security/password - Similarly to the previous pattern, this matches the password element node whose parent element node is security, and security must be a child of the first datasources/datasource element node.
We essentially utilize similar patterns for matching the second instance, i.e. to match the required element nodes in second datasources/datasource element node we change the index from [1] to [2].

Using xsltproc with XSLT in a bash script

If xsltproc is available on your host system then you may want to consider utilizing the following bash script:

script.sh

#!/usr/bin/env bash

xslt() {
cat <<'EOX'
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="datasource[1]/connection-url/text()">
    <xsl:text>jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE</xsl:text>
  </xsl:template>

  <xsl:template match="datasource[1]/security/user-name/text()">
    <xsl:text>Atom</xsl:text>
  </xsl:template>

  <xsl:template match="datasource[1]/security/password/text()">
    <xsl:text>Atom</xsl:text>
  </xsl:template>


  <xsl:template match="datasource[2]/connection-url/text()">
    <xsl:text>jdbc:oracle:thin:@{Content after the @ to be changed}</xsl:text>
  </xsl:template>

  <xsl:template match="datasource[2]/security/user-name/text()">
    <xsl:text>{aldo to username}</xsl:text>
  </xsl:template>

  <xsl:template match="datasource[2]/security/password/text()">
    <xsl:text>{aldo to username}</xsl:text>
  </xsl:template>

</xsl:stylesheet>
EOX
}

xml_file=./some/path/to/file.xml

xsltproc --novalid <(xslt) - <"$xml_file" > "${TMPDIR}result.xml"

mv -- "${TMPDIR}result.xml" "$xml_file" 2>/dev/null || {
  echo -e "Cannot move .xml from TMPDIR to ${xml_file}" >&2
  exit 1
}

_{Note: You'll need to redefine the ./some/path/to/file.xml path that is assigned to the xml_file variable as necessary.}

Explanation:

An XSLT Stylesheet is utilized that includes several templates to match the necessary element nodes and replaces their text nodes as required.
The xsltproc tool/command transforms the source .xml file using the given XSLT.
The resultant .xml file is written to the systems temporary directory, (i.e. TMPDIR), then moved using the mv command to the same location as the original source xml_file - effectively overwriting it.

Reino · Answer 2 · 2022-02-26T02:05:14.123

This has been stated countless times already; DON'T use RegEx to parse HTML/XML or JSON! Use a tool with native support instead.

With xidel you can use its x-replace-nodes() function a couple of times, feeding the output to the next instance:

$ xidel -s input.xml -e '
  x:replace-nodes(
    (//security)[1]/node()/text(),
    "Atom"
  )/x:replace-nodes(
    (//security)[2]/user-name/text(),
    "{aldo to username}"
  )/x:replace-nodes(
    (//security)[2]/password/text(),
    "{aldo to password}"
  )
' --output-node-format=xml --output-node-indent

Alternatively you can combine the 2^nd and 3^rd invocation of the function:

$ xidel -s input.xml -e '
  x:replace-nodes(
    (//security)[1]/node()/text(),
    "Atom"
  )/x:replace-nodes(
    (//security)[2],
    element security {
      element user-name {"{aldo to username}"},
      element password {"{aldo to password}"}
    }
  )
' --output-node-format=xml --output-node-indent

Output to stdout in both cases:

<datasources>
  <datasource jndi-name="java:jboss/datasources/TestFlow" pool-name="TestFlow" enabled="true" use-java-context="true" statistics-enabled="${wildfly.datasources.statistics-enabled:$ {wildfly.statistics-enabled:false}}">
    <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE</connection-url>
    <driver>h2</driver>
    <security>
      <user-name>Atom</user-name>
      <password>Atom</password>
    </security>
  </datasource>
  <datasource jta="false" jndi-name="java:/AdminDSource" poolname="AdminDSource" enabled="true" use-java-context="true">
    <connection-url>jdbc:oracle:thin:@xxxxxx.xxxxxxx.xxxxxxxx-1.rds.amazonaws.com:xxxx:ORCL</connection-url>
    <driver>oracle</driver>
    <security>
      <user-name>{aldo to username}</user-name>
      <password>{aldo to password}</password>
    </security>
  </datasource>
</datasources>

To update the input file use the command-line option --in-place.

To process multiple xml-files you could let Bash handle it...

$ for file in *.xml; do
  xidel -s --in-place "$file" -e '
    [...]
  '
done

...but if you have lots of xml-files, calling xidel for each and every one of them isn't rather efficient. xidel can do this much more efficiently with its integrated EXPath File Module:

$ xidel -se '
  for $file in file:list(.,false(),"*.xml") return   (: iterate over all the current dir's xml-files :)
  file:write(
    $file,                                           (: essentially overwrite the input file :)
    x:replace-nodes(
      (doc($file)//security)[1]/node()/text(),       (: doc($file) to open the input file inside the query :)
      "Atom"
    )/x:replace-nodes(
      (//security)[2],
      element security {
        element user-name {"{aldo to username}"},
        element password {"{aldo to password}"}
      }
    ),
    {"indent":true()}                                (: "prettify" the output :)
  )
'

score 1 · Answer 3 · answered Jun 18 '21 at 08:12

The first question to ask is: do I need a script at all to do this? I would think that even if you have, say, 10 files that all need to have the same information replaced, you might be much quicker doing them all by hand (i.e., in a text editor) than trying to write a bug-free script. Of course, if you have 50 or 100 files, the story changes.

But then it really depends a bit on what the replacement task actually entails. If you're thinking about something as simple as:

V0: Replace every occurence of <user-name>test</user-name> with <user-name>atom</user-name> etc.

then sed might be the right to tool for the job. It processes text files line by line, but it is not very good at taking context into account that comes from previous or later lines. So, if your task is actually more like

V1: Replace <user-name>test</user-name> with <user-name>atom</user-name> but only if the previous connection URL was <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE</connection-url> in which case also change that one to <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE</connection-url>, etc.

then sed is going to have a much harder time.

Another line-based command-line tool is awk which is much more powerful because it allows you to write matching rules and can represent context information in variables. However, it's still not straight-forward if, for instance, we flip the order of the conditions in V1:

V2: Replace <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE</connection-url> with <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE</connection-url> but only if the following user name is <user-name>test</user-name> in which case also change that to <user-name>atom</user-name>, etc.

Now you cannot write replacements right when you process each line, you might have to hold on to some lines for a while because the information you encounter later in the file determines what you should do with those lines. Then, again, it's starting to become complex. But it gets worse. What if, for some reason, your xml file is formatted just slightly differently:

<datasource jndi-name="java:jboss/datasources/TestFlow" 
            pool-name="TestFlow" 
            enabled="true" 
            use-java-context="true" 
            statistics-enabled="${wildfly.datasources.statistics-enabled:$ {wildfly.statistics-enabled:false}}">
  <connection-url>
    jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE
  </connection-url>
...

When everything is not neatly presented in a single line, all of a sudden the processing becomes much harder for awk. In the worst case, you'll basically end up implementing an XML parser in awk and, of course, nobody wants to do that.

So, then why not use a proper, existing XML parser in the first place? There are some options to do that on the command line but perhaps it's better to just move to a more powerful scripting language. Here's an example for a small Python script that does the replacement you want but in a context-sensitive way: a element is only touched if all three replacements (connection-url, user-name, password) match.

from bs4 import BeautifulSoup
import re
import sys

# (connection-url, user-name, password) -> (connection-url, user-name, password)
REPLACEMENTS = {
    ('jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE', 'test', 'test'):
    ('jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE', 'atom', 'atom'),

    ('jdbc:oracle:thin:@xxxxxx.xxxxxxx.xxxxxxxx-1.rds.amazonaws.com:xxxx:ORCL', 'aldo', 'aldo'):
    ('jdbc:oracle:thin:@{Content after the @ to be changed}', '{aldo to username}', '{aldo to password}')
}

# check correct invocation
if len(sys.argv) != 3:
    print(f"USAGE: python {sys.argv[0]} <infile> <outfile>")
    sys.exit(1)

# read infile
with open(sys.argv[1], 'r') as f:
    soup = BeautifulSoup(f, 'xml')

# apply transformations
for datasource in soup.datasources.findAll("datasource", recursive=False):
    elements = (datasource.find('connection-url', recursive=False),
                datasource.security.find('user-name', recursive=False),
                datasource.security.password)
    if all(elements):
        old = tuple(e.text for e in elements)
        if old in REPLACEMENTS:
            new = REPLACEMENTS[old]
            for e, text in zip(elements, new):
                e.string = text

# write outfile
with open(sys.argv[2], 'w') as f:
    for line in soup.prettify().split('\n'):
        f.write(re.sub(r'^(\s+)', '\\1\\1', line))
        f.write('\n')

As I wrote above, the simplest thing (a sed script) might already be a good match for the task, but it depends on the (possible) circumstances.

score 0 · Answer 4 · answered Jun 18 '21 at 06:46

If you can make another file(sample.sed), the answer is below.

$ cat sample.sed 
/<driver>h2<\/driver>/,/<\/security>/{
    s/<user-name>test<\/user-name>/<user-name>Atom<\/user-name>/g
    s/<password>test<\/password>/<password>Atom<\/password>/g
}
/<driver>oracle<\/driver>/,/<\/security>/{
    s/<user-name>aldo<\/user-name>/<user-name>username<\/user-name>/g
    s/<password>aldo<\/password>/<password>password<\/password>/g
}

for filename in *.xml; do
    sed -i.bak -f sample.sed $filename
done

Using Shell Script, find and update .xml file's tag values that was present for multiple times

4 Answers4

Using an XMLStarlet command

Using xsltproc with XSLT in a bash script