-1

Background info: I've got an XML file that my supplier uploads each night with new products and updated stock counts etc. But they've stitched me up and they don't have a Description in the XML file, they have a link to their site which has the description in raw text.

What i need to do is have a script that loops through the document i download from them and replace the URL with the content of the URL.

For example, if i have

<DescriptionLink>http://www.leadersystems.com.au/DataFeed/ProductDetails/AT-CHARGERSTATION-45</DescriptionLink>

I want it to end up as

<DescriptionLink>Astrotek USB Charging Station Charger Hub 3 Port 5V 4A with 1.5m Power Cable White for iPhone Samsung iPad Tablet GPS</DescriptionLink>

I've tried a few things but i'm not very proficient with scripting or loops. So far i've got:

#!/bin/bash
LINKGET=`awk -F '|' '{ print $2 }' products-daily.txt`

wget -O products-daily.txt http://www.suppliers-site-url.com
sed 's/<DescriptionLink>*/<DescriptionLink>$(wget -S -O- $LINKGET/g' products-daily.txt

But again, i'm not sure how this all really works so it's been trial and error. Any help is appreciated!!!

Updated to include example URL.

Mitchell
  • 1
  • 3

1 Answers1

0

You'll want something like this (using GNU awk for the 3rd arg to match()):

$ cat tst.awk
{
    head = ""
    tail = encode($0)
    while ( match(tail,/^([^{]*[{])([^}]+)(.*)/,a) ) {
        desc = ""
        cmd = "curl -s \047" a[2] "\047"
        while ( (cmd | getline line) > 0 ) {
            desc = (desc=="" ? "" : desc ORS) line
        }
        close(cmd)
        head = head decode(a[1]) desc
        tail = a[3]
    }
    print head decode(tail)
}
function encode(str) {
    gsub(/@/,"@A",str)
    gsub(/{/,"@B",str)
    gsub(/}/,"@C",str)
    gsub(/<DescriptionLink>/,"{",str)
    gsub(/<\/DescriptionLink>/,"}",str)
    return str
}
function decode(str) {
    gsub(/}/,"</DescriptionLink>",str)
    gsub(/{/,"<DescriptionLink>",str)
    gsub(/@C/,"}",str)
    gsub(/@B/,"{",str)
    gsub(/@A/,"@",str)
    return str
}

$ awk -f tst.awk file
<DescriptionLink>Astrotek USB Charging Station Charger Hub 3 Port 5V 4A with 1.5m Power Cable White for iPhone Samsung iPad Tablet GPS</DescriptionLink>

See https://stackoverflow.com/a/40512703/1745001 for info on what the encode/decode functions are doing and why.

Note that this is one of the rare cases where use of getline is appropriate. If you're ever considering using getline in future make sure you read and fully understand all of the caveats and uses cases discussed at http://awk.freeshell.org/AllAboutGetline first.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • When running this command on the 5000+ entries i have in my file i get an error saying ``fatal: cannot open pipe `curl -s (Too many open files)`` Any idea's Ed? – Mitchell Jun 20 '17 at 09:39
  • Yeah, I forgot to close the pipe after each call like it shows you in that article I referenced at the bottom of my answer (see `a) Reading from a pipe` in it). Fixed now. – Ed Morton Jun 20 '17 at 12:59