7

I'd like to write down a BNF-like formal grammar for describing the command line usage of some GNU/Linux tools. For example I can describe the usage of the cat command as:

(cat-command) : 'cat' (arguments-list)
(arguments-list) : (argument)
(arguments-list) : (arguments-list) (argument)
(argument) : (file)

The problem is I can't write down a precise grammar for some commands such as md5sum. My first attempt at that would be the following:

(md5sum-command) : 'md5sum' (arguments-list)
(arguments-list) : (argument)
(arguments-list) : (arguments-list) (argument)
(argument) : (file)
(argument) : '--check'

But as you can see this grammar allows you to specify the --check argument as many times as you wish, which is incorrect as you should use it at most one time.

How can I fix that? Also, what kind of formal grammars I should study for better treating this kind of problems?

Trevor Hickey
  • 36,288
  • 32
  • 162
  • 271
Francesco Turco
  • 219
  • 1
  • 4
  • 1
    Are you sure that your grammar IS incorrect ? For me many of the Unix commands quite happily accept multiple occurrences of the same argument, eg 'ls -l -l -l'. – High Performance Mark Apr 08 '10 at 12:16
  • Yes, you're right. But maybe there are some tools that don't allow repetitions. I may also write my own program that doesn't allow repetitions. So I think my problem is still relevant. – Francesco Turco Apr 08 '10 at 13:55

2 Answers2

4

You could try something like:

(md5sum-command) : 'md5sum' (arguments-list)
(arguments-list) : (file-arguments) | '--check' (file-arguments)
(file-arguments) : (file) (file-arguments)

Assuming you want to be able to specify exactly one --check per command, but not depend on it being the first argument, you could use:

(md5sum-command) : 'md5sum' (arguments-list)
(arguments-list) : (file-arguments) | (file-arguments) '--check' (file-arguments)
(file-arguments) : (file) (file-arguments)

Also note, that the pipe (|) symbol is just a shortcut for an additional rule. The following is equivalent:

(md5sum-command) : 'md5sum' (arguments-list)
(arguments-list) : (file-arguments) 
(arguments-list) : (file-arguments) '--check' (file-arguments)
(file-arguments) : (file) (file-arguments)

I'd be surprised if you couldn't specify most unix commands with a context free grammar like those expressed in BNFs.

Daren Thomas
  • 67,947
  • 40
  • 154
  • 200
  • Your version seems to work, but it is not general enough. For example, you are not required to put the '--check' argument immediately after the 'md5sum' argument: you can have many files specified before and after it. – Francesco Turco Apr 08 '10 at 13:59
  • Having read your new version, I managed to write down the following: (arguments-list) : [(files-list)] [(mode-option)] [(files-list)] ['--check'] [(files-list)] (arguments-list) : [(files-list)] ['--check'] [(files-list)] [(mode-option)] [(files-list)] (mode-option) : '--binary' (mode-option) : '--text' I couldn't abbreviate it further. So adding other arguments such as '--quiet', '--warn' or '--status' would result in an exceedingly long grammar for such a simple command. – Francesco Turco Apr 09 '10 at 11:43
  • True. You probably want to make a difference between arguments that are just flags (can appear anywhere in argument list, act globally) and arguments that act on the files following. – Daren Thomas Apr 09 '10 at 13:00
-1

I probably found an answer, although it's not the expected one. You can choose to recognise the correctness of a command instead of generate correct commands. Using some hybrid language, you can write the following set of requirements:

argument(0) == "md5sum"
forall i, if i != 0 then argument(i) == "--binary" or
                         argument(i) == "--text" or
                         argument(i) == "--check" or
                         argument(i) == "--status" or
                         argument(i) belongs to <file>
0 <= instances("--binary") + instances("--text") <= 1
0 <= instances("--check") <= 1
if instances("--check") == 1 then 0 <= instances("--status") <= 1

I won't mark this answer as the correct one because I'm still curious to know if there exists a way to generate correct commands.

Francesco Turco
  • 219
  • 1
  • 4