c++: how to break (not parse) a string into command line arguments?

Question

I am using boost::program_options to parse my command line. Now, I am adding support for batch execution, by means of a --script argument denoting a file, containing command line options on every line, for instance:

--src="\"z:\dev\veds\sqlexpress\Run 1.ved\"" --src-kind=bla --yz
--src=z:\dev\veds\sqlexpress\db.ebf
--src=z:\dev\veds\sqlexpress\db2.mdf
--src=db3
--src="\"z:\dev\veds\sqlite\Run 41 (Run 23).ved\""
--src=z:\dev\veds\sqlite\ws_results_db_2012_01_15_18_37_03.db3
--src=z:\dev\veds\mysql\10.ved
--src=z:\dev\veds\mysql\db

Each line in the file denotes a single execution of my tool and lists the command line options for this particular execution.

The problem is that reading the script file yields complete lines, which are not broken into individual command line options. But, one has to have argc and argv in order to use boost::program_options, i.e. it depends on someone to break the command line into different options.

I cannot simply break by spaces, because some values contain spaces and hence they are enclosed with double quotes, even nested double quotes.

On the other hand, I do not want to run the tool from the OS command prompt for each set of command line options, because of the expensive bootstrap - the reason why I am introducing the script feature in the first place.

Is there a simple way to break the lines into the command line arguments in the same way the OS does it?

Thanks.

What a mess! Backslash is an escape character is followed by a quote, but only if followed by a quote? What if there was a backslash at the end of a string (immediately before the quote)? I don't think the original data is actually recoverable from the log file snippet you give. — Ben Voigt, Jan 22 '12 at 22:08
If you accept a platform-specific solution, there's [CommandLineToArgvW](http://msdn.microsoft.com/en-us/library/windows/desktop/bb776391%28v=vs.85%29.aspx). Otherwise, searching the net there are many solutions to go from command line to argv. — Matteo Italia, Jan 22 '12 at 22:13
Guys, this is a very very real example. The windows shell does not care about backslashes, so it passes them untouched to the application. The strings are fed into `boost::filesystem::path`, which must be explicitly quoted in order to contain spaces, hence the nested escaped quotes, only this time the backslashes are interpreted by the C compiler as the escape characters to let the nested quotes in. — mark, Jan 22 '12 at 22:14
(for example: http://bbgen.net/blog/2011/06/string-to-argc-argv/; also: http://stackoverflow.com/questions/1706551/parse-string-into-argv-argc ) — Matteo Italia, Jan 22 '12 at 22:15

score 1 · Answer 1 · answered Jan 22 '12 at 22:21

OK, I have it figured out. Here is my code:

  string script;
  {
    ifstream file(scriptPath.c_str());
    file.seekg(0, ios::end);
    script.resize(file.tellg());
    file.seekg(0, ios::beg);
    file.read(const_cast<char *>(script.c_str()), script.size());
  }
  boost::replace_all(script, "\\", "\\\\");       // Escape the backslashes
  boost::replace_all(script, "\\\\\"", "\\\"");   // Except for those escaping the quotes
  boost::trim_right_if(script, is_space_or_zero); // There are extra '\0' in the string, because the file is read as text, but its length was computed as binary
  vector<string> lines;
  boost::split(lines, script, boost::is_any_of("\n"));  // I prefer getting a string line iterator here, the question is how?
  escaped_list_separator<char> sep('\\', ' ', '"');
  int res = 0;
  BOOST_FOREACH (const string& line, lines) 
  {
    // reset the command line variables here, since this is like a new execution

    // Tokenize the command line, respecting escapes and quotes  
    tokenizer<escaped_list_separator<char>> tok(line, sep);
    vector<string> args(tok.begin(), tok.end());

    po::variables_map vm;
    po::store(po::command_line_parser(args).options(options).run(), vm);

    res += run(vm);
  }

I am using http://www.boost.org/doc/libs/1_48_0/libs/tokenizer/ to break the lines. Works very well.

Hint for others: use `args.erase(std::remove_if(args.begin(), args.end(), [](std::string const& s) { return s.empty(); }), args.end());` since escaped_list_separator has no **empty_token_policy** like **char_separator** — 5andr0, Dec 06 '17 at 18:54

John Zwinck · Answer 2 · 2012-01-22T21:15:56.627

The Boost documentation covers Response Files, including a simple example using them. That would be close to what you want, except that they also say it "has some limitations," which include the parsing of spaces.

They also have parse_config_file() which will load options from a file. Here you'd be giving up having identical syntax in the file as on the command line, and their included implementation would only (easily) support one "command invocation" per program invocation. But I bet you could look at how they do it and copy some of that code. If I were you I might adjust it to support .ini syntax like this:

[job1]
src=z:\dev\veds\sqlexpress\Run 1.ved
src-kind=bla
y=
z=

[job2]
src=z:\dev\veds\sqlexpress\db.ebf

[another_job]
src=z:\dev\veds\sqlexpress\db2.mdf

I bet this is not a horrible amount of extra work, and it gives one extra benefit of having an explicit name for each job you run. Boost's own parse_config_file() uses the section names (in brackets) as a prefix for the option names, but this is not necessary, so you may as well repurpose them in the interest of keeping the simple .ini syntax.

Edit: You want something simpler? OK. Abandon the idea of having identical syntax in the file as on the command line with respect to quoting and spaces. Decide on a suitable delimiter like ; between options if you must support spaces within your arguments. Turning something like this into argc/argv pairs should be easy enough using std::string::find() or Boost Tokenizer:

--src=z:\dev\veds\sqlexpress\Run 1.ved; --src-kind=bla; --yz
--src=z:\dev\veds\sqlexpress\db.ebf
--src=z:\dev\veds\sqlexpress\db2.mdf

Split on ; to make argv[1,2,3] in the first example, and copy your program's own argv[0] to the "fake" argv[0], then parse the options using Boost.

This is quite far away from "a simple way to break the lines ..." — mark, Jan 22 '12 at 20:41
Yeah, I was being a little snarky. :) I think it has a lot to do with the combination of C++ and string parsing (often not a great match), plus the fact that you seem to want the syntax to be identical in the file as on the command line, all while not invoking the command interpreter (shell). I have a new idea or two, which I will add as edits now. — John Zwinck, Jan 22 '12 at 21:08

score 0 · Answer 3 · answered Jan 22 '12 at 22:04

Check this out:: C++ Cookbook Splitting a string (Recipe 4.6)

Example 4-10. Split a delimited string

#include <string>
#include <vector>
#include <functional>
#include <iostream>

using namespace std;

void split(const string& s, char c,
           vector<string>& v) {
   string::size_type i = 0;
   string::size_type j = s.find(c);

   while (j != string::npos) {
      v.push_back(s.substr(i, j-i));
      i = ++j;
      j = s.find(c, j);

      if (j == string::npos)
         v.push_back(s.substr(i, s.length( )));
   }
}

int main( ) {
   vector<string> v;
   string s = "Account Name|Address 1|Address 2|City";

   split(s, '|', v);

   for (int i = 0; i < v.size( ); ++i) {
      cout << v[i] << '\n';
   }
}

--

template<typename T>
void split(const basic_string<T>& s, T c,
           vector<basic_string<T> >& v) {
   basic_string<T>::size_type i = 0;
   basic_string<T>::size_type j = s.find(c);

   while (j != basic_string<T>::npos) {
      v.push_back(s.substr(i, j-i));
      i = ++j;
      j = s.find(c, j);

      if (j == basic_string<T>::npos)
         v.push_back(s.substr(i, s.length( )));
   }
}

Example 4-11. Splitting a string with Boost

#include <iostream>
#include <string>
#include <list>
#include <boost/algorithm/string.hpp>

using namespace std;
using namespace boost;

int main( ) {

   string s = "one,two,three,four";
   list<string> results;

   split(results, s, is_any_of(","));  // Note this is boost::split

   for (list<string>::const_iterator p = results.begin( );
        p != results.end( ); ++p) {
      cout << *p << endl;
   }
}

--

template<typename Seq,
         typename Coll,
         typename Pred>
Seq& split(Seq& s, Coll& c, Pred p,
        token_compress_mode_type e = token_compress_off);

I'm not at liberty of sharing the text (illegal to copy/paste from a book), but these examples are pretty explainative. If you want to see the text you will need to refer the book.

These two examples were taken from recipe 4.6 of

C++ Cookbook

By Jeff Cogswell, Christopher Diggins, Ryan Stephens, Jonathan Turkanis

Publisher: O'Reilly

Pub Date: November 2005

ISBN: 0-596-00761-2

I think I have found a simpler way - see my own answer. – mark Jan 22 '12 at 22:23 — mark, Jan 22 '12 at 22:23

c++: how to break (not parse) a string into command line arguments?

3 Answers3

Linked