1

I'm trying to automate parsing XML files to MySQL using a simple shell script. The parser works fine, it takes ABC123.xml and outputs ABC123.sql.

I've tried to write a shell script that will grab the files in a directory, and pass the file name to the script. In the files, the prefix ABC is always present, but the numbers change, and are not sequential. So what I want is, for example, take file ABC*.xml and output ABC*.sql.

Here is what I have:

for file in $(ABC*.xml); do

temp="java -cp XMLParser-2.0-SNAPSHOT.jar:$MYSQL_CJ Engine.xmlparsers.Parser 
-url='jdbc:mysql://localhost/DBNAME?useUnicode=true&characterEncoding=UTF-8&user=USERNAME&password=PASSWORD' 
-file='/mnt/volume/SQL/$file.sql' /mnt/volume/XML/$file.xml"
eval $temp
done

The parser runs and I get an output file ABC*.sql, and nothing else.

I have also tried

for file in *; do

and this just processes all the file in the directory of the script

Previously I ran a script that worked perfectly with files that had sequential suffixes, and the solution was:

for i in $( seq 1 500 ); do

temp="java -cp XMLParser-2.0-SNAPSHOT.jar:$MYSQL_CJ Engine.xmlparsers.Parser 
-url='jdbc:mysql://localhost/DBNAME?useUnicode=true&characterEncoding=UTF-8&user=USERNAME&password=PASSWORD' 
-file='/mnt/volume/SQL/ABC-$i.sql' /mnt/volume/XML/ABC-$i.xml"
eval $temp
done

Any ideas would be helpful. Thanks

UPDATE

Many thanks for the initial comments.

I tried $(ls ABC*.xml) as squeamishossifrage suggested, but that gives an error ls: cannot access ABC*.xml: No such file or directory.

I would love to upload directly to MySQL, but each XML file has several dozen elements, with many child elements, which are broken out into different tables.

The function of the variable, as user1934428 requested is as follows:

Let's say I have three XML files:

ABC12345.xml
ABC98172.xml
ABC7211891.xml

The parser should read these files and parse the elements into SQL and output

ABC12345.sql
ABC98172.sql
ABC7211891.sql

What I want the shell script to do is that for all the XML files in this directory, feed them to the script where $file.xml is the input file and $file.sql is the output file.

As I mentioned above, this works perfectly if I have files with sequential suffixes, for example

ABC-1.xml
ABC-2.xml
ABC-3.xml

using for i in $( seq 1 3 ); do outputs

ABC-1.sql
ABC-2.sql
ABC-3.sql

where ABC-$i.xml is the input and ABC-$i.sql is the output.

What I cannot figure out is how to make the equivalent of for i in $( seq 1 3 ); do when I do not know the file names.

SOLUTION

Many thanks for the comments from everyone, especially Charles and Alexei. The final solution I used is what is provided by Alexei below. I don't seem to have the capability to up-vote here, or I would have done that.

Community
  • 1
  • 1
Ken
  • 47
  • 7
  • Perhaps you could change `for file in $(ABC*.xml)` to `for file in $(ls ABC*.xml)`. But why not just [import the XML files directly into MySQL](https://dev.mysql.com/doc/refman/5.5/en/load-xml.html)? – r3mainer Dec 10 '15 at 11:04
  • 1
    @squeamishossifrage or just not open a new subshell for no reason... – 123 Dec 10 '15 at 11:09
  • 2
    Do you just want `for file in ABC*.xml; do`? – Tom Fenech Dec 10 '15 at 11:54
  • 1
    What is `$(ABC*.xml)` supposed to do? This expression would first expand to all the xml files starting with ABC, then picking up the first one in the expanded list, and try to execute it *as a program*. Perhaps you just wanted to have `for file in ABC*.xml; do` ??? – user1934428 Dec 10 '15 at 13:23
  • I am very curious on your process. How does converting .xml to .sql import to MySQL. XML is a data file. SQL is a script file of action or select statements. Does java jar program turn data to insert values script commands? Do note MySQL supports [LoadXML()](https://dev.mysql.com/doc/refman/5.5/en/load-xml.html) but node structure must be simplified. – Parfait Dec 11 '15 at 01:42
  • 1
    @squeamishossifrage, using `ls` at all here is broken; see http://mywiki.wooledge.org/ParsingLs -- and completely unnecessary, as `ABC*.xml` will evaluate to the correct list as a glob. – Charles Duffy Dec 11 '15 at 02:17
  • @CharlesDuffy Interesting link — thanks. – r3mainer Dec 11 '15 at 03:17
  • @Ken, ...also, don't use `eval`. See http://mywiki.wooledge.org/BashFAQ/048 -- it makes it very, very easy for malicious filenames (or other user-controlled data) to trigger arbitrary code execution. – Charles Duffy Dec 11 '15 at 04:47
  • Is the matter of looping over, and replacing extensions on, arbitrary names the core of your question? If so, that's already asked and answered in our knowledge base many times over, making this question duplicate on the whole. – Charles Duffy Dec 11 '15 at 16:00

1 Answers1

1

Synopsis:

  1. Use shopt -s nullglob to prevent the literal ABC*.xml from matching. If your pattern doesn't match, nothing will be returned, and would suggest ABC isn't the leftmost, commonly shared string within this search space. Ref

  2. Within the for loop, use base="${file%.*}"; to extract the name. Ref

  3. If you want single quotes to appear inside you eval, escape them: \'

  4. Don't use eval (if possible)

    • Execute the command directly. Variables passed within your arguments should be unwrapped ("$MYSQL_CJ","$base.xml"). Double quotes to prevent odd behavior. Won't work with ~ (e.g "~/$base.xml"), if needed could ~/$base.xml, with above caveat.
    • See Charles Duffy comments

Place in shell script or run from bash (replace /path/to/ with absolute path to ABC files):

FILES=/mnt/volume/XML/ABC*.xml;
shopt -s nullglob; #don't match ABC*.xml literal
for file in $FILES; do
 filename =$(basename "$file");
 filename="${filename%.*}";
 java -cp "XMLParser-2.0-SNAPSHOT.jar:$MYSQL_CJ" Engine.xmlparsers.Parser \
  -url="jdbc:mysql://localhost/DBNAME?useUnicode=true&characterEncoding=UTF-8&user=USERNAME&password=PASSWORD" \
  -file="/mnt/volume/SQL/$filename.sql" "$file";
 done; \
shopt -u nullglob;  #disable nullglob

Test:

touch Harrison_Wells.xml; 
bash; 
shopt -s nullglob;
for file in Harrison_*.xml; do 
  filename="${file%.*}"; 
  echo "The 'Reverse-Flash' is $filename.are_we_right"; 
  done; \
shopt -u nullglob;
Community
  • 1
  • 1
  • Why keep the OP's use of `eval`, and bugs (including security bugs) caused by same? – Charles Duffy Dec 11 '15 at 04:47
  • ...to be more specific, and give an example of the severity of this concern: What happens if someone runs `touch $'ABC$(rm -rf ..)\'$(rm -rf ..)\'.xml'` before invoking the code you're showing here? And that's a relatively tame case -- there are tricks that can be used to generate `/` characters (ie. uuencode/uudecode), to refer to parent directories / the root / etc. – Charles Duffy Dec 11 '15 at 04:49
  • You're welcome to contribute to the answer. Eval needs to be used carefully. However, I think you'll need to provide better justification. First, the OP did not ask for advice on the use of eval. Second, you did not provide an alternative to his solution that would be meaningfully safer, which needs to include his use case. Third, the example you provide is unconvincing. Preventing users from deleting data is the use case for permissions. All imo, happy to learn something new. – Alexei Kotlar Dec 11 '15 at 05:04
  • Edited to state that eval may not be the best solution. Charles, if you have any comments, please add your contributions. – Alexei Kotlar Dec 11 '15 at 14:00
  • The example is convincing *if* the script is run by a user with more privileges than the one who creates the files in question (or, for an even more pathological case, with more privileges than the user providing data used to name the files in question; consider if files are uploaded via FTP or named per data submitted to a CGI script); should any of the above be true, this becomes a privilege escalation bug. In any event, I'm glad to provide an `eval`-free answer; frankly, I thought the construction of one to be so obvious that my own guidance wouldn't be needed beyond showing the necessity. – Charles Duffy Dec 11 '15 at 15:42
  • Did you check the edited answer before you commented. I think the explanation I have is obvious enough. If you think a change is needed, submit one. – Alexei Kotlar Dec 11 '15 at 15:47
  • And if you're in agreement please upvote the answer. Thanks. – Alexei Kotlar Dec 11 '15 at 15:50
  • I'm partially in agreement, but there's a lot that's neither here nor there, or less than entirely correct. For instance, try running the sample malicious name I provided through your `reverseFlash` example, and you'll see that it successfully escapes the single quotes. – Charles Duffy Dec 11 '15 at 15:56
  • (on my current test platform, deletion was only escaped because the local `rm` refuses to delete either `.` or `..`). – Charles Duffy Dec 11 '15 at 15:58
  • ...if you want to safely escape arbitrary content for `eval`, see `printf %q`. – Charles Duffy Dec 11 '15 at 16:01
  • Charles, my updated answer addressed this, I don't think you read it. OP does not ask about the use of eval, which was not causing his issue. Again, blanket statements like "eval" is evil are fallacious, and furthermore need to take into account the OP's use case to be topical. You simply don't have justification to state that his use is dangerous (it likely isn't unless you seriously expect his class path variable to contain executable, malicious code. – Alexei Kotlar Dec 11 '15 at 16:10
  • I think we disagree (perhaps on principal) about scope. I'm not here simply to answer questions but to spread best practices, such that someone who learned to write shell scripts off StackOverflow is more likely to write code that I would be willing to maintain. Creating a learning resource that's factually correct but showcases bad practices in supposedly canonical content is how people get, oh, say, the TLDP "Advanced" bash scripting guide; over in the Freenode #bash channel, we spend far too much time trying to help people unlearn bad habits they picked up there. – Charles Duffy Dec 11 '15 at 16:12
  • Yes, I think we disagree about scope. I believe that mentioning that eval is not needed is sufficient, and this answer more completely addresses his original question. – Alexei Kotlar Dec 11 '15 at 16:13
  • ...if you look at my SO answers, you'll see that whenever I showcase something that's a bad practice, I put a "# DO NOT DO THIS" comment directly in the code, additional to any explanation in the surrounding text -- to make the distinctions clear even to readers who go straight to the code samples and skip the commentary, of which there are sadly many. If people took the time to read the text surrounding code in answers fully and completely, then disclaimers in that text would carry more weight. – Charles Duffy Dec 11 '15 at 16:14
  • Thank you, that's good advice! I'm clearly new to SO, trying to do my best here. – Alexei Kotlar Dec 11 '15 at 16:20
  • The question is, why contribute an entirely separate, incomplete answer? Seems unnecessarily confusing, you could have simply contributed the eval portion. – Alexei Kotlar Dec 11 '15 at 17:46
  • Another difference of philosophy regarding scope: What you call "completeness", I call "excess" -- hiding the point in an overage of verbiage and digressions. Though, to be fair, the OP did that too, in their question: If the only thing they wanted to know (as their edits seem to indicate) was how to iterate over unknown names and generate new names surrounding them, the question should have been pruned to focus only on that core concept; we have a close reason for questions with excessive scope, that being "too broad". – Charles Duffy Dec 11 '15 at 17:58
  • (...I would have put this up for a vote for "too broad", but as the underlying question the OP clarified themselves as genuinely asking is one that's been previously asked and answered many times over, there's no need: Having sufficient reputation in a tag allows flag of questions so tagged as duplicate without need for a vote). – Charles Duffy Dec 11 '15 at 18:04
  • Yes that was the point. I'm going move the edit notes into the right section, lack of experience with SO, besides that it's ironic that you call my post excessive, since yours actually delivers most of its value in a tangential topic while only partially addressing the core issue, such as why op was seeing ABC*.xml as a valid file name, a sign of scope creep. I think this conversation has taken long enough. Thanks for the warm welcome and the invaluable feedback! – Alexei Kotlar Dec 11 '15 at 18:06
  • Most welcome; hope I wasn't overly brusque at any point. With your answer edited as it has been, I'm happy to give it my upvote and remove my own. – Charles Duffy Dec 11 '15 at 18:10
  • Charles, I truly appreciate your feedback. I have a lot to learn, and am glad for any additional feedback you have, on this or future answers. – Alexei Kotlar Dec 11 '15 at 18:51