2

I'll be writing a validator for a specific file format (the format itself is unimportant). There's a large set of documents that specify what every section of the format needs to look like, how the different parts relate and so on. Lots and lots of MUST's, SHALL's, SHOULD's, MAY's etc.

The architecture I envision is as follows: load the document into memory/separate files on disk, and then run numerous validation "passes" on the document: every pass would check for the adherence to one and only one rule specified in the standard, and if the pass fails, an error message is printed to stdout. Every pass would be a separate class implementing a common interface and an instance of each pass would be created on the stack, run on the document and the Result collected (containing error ID, message, line/column number etc). The Results would then be looped over and messages printed out. A unit test could then be easily created for every pass.

Now, I expect to eventually have hundreds of these "pass" classes. And every one of these would need to be instantiated once, run over the document, and the Result collected.

Do you see where I'm going with this? How do I create all these different instances without having a 500 line function that creates each and every one, line by line? I'd like some sort of loop. I'd also like the new pass classes to somehow be "discovered" when created, so I don't have to manually add lines that instantiate those new classes.

Thinking about it now, all this seems to remind me of unit testing frameworks...

Lucas
  • 6,328
  • 8
  • 37
  • 49
  • Would it be possible to just use a parser or even a lexer and autogenerate the source code? – pmr Jul 03 '10 at 12:59
  • @pmr Hm, that seems like a overkill a bit. – Lucas Jul 03 '10 at 13:33
  • You said there is a lot of variability and more than one document specifying the format. Possibly the specifications already is in BNF. Using a parser generator seems less error prone and suitable for the task. – pmr Jul 03 '10 at 14:09
  • @pmr I thought you meant autogenerating the code that creates the pass classes. That would be overkill. And using a parser for the format would not work, yes. See my response to Anders K. – Lucas Jul 03 '10 at 15:14
  • Thanks for the clarification. Sounds like the format is a horrible mess. Good luck :) – pmr Jul 03 '10 at 15:31

4 Answers4

1

C++ is a static language that lacks reflection, hence you will have to enumerate your classes one way or another, be it a map of class names to factory functions, an explicit function that creates them all, or cppunit-style "registration" (same as putting them in a static map).

With that said, go with the simplest possible way that's easiest to change, and does not introduce too much needless work and boilerplate code: creating a list of them in a function.

You are saying you will have hundreds of them "eventually", not now. By the time you have hundreds of them, your design will have changed completely. If instantiation of all of them is localised in one function, it will make it easier to change your (uniform) design to add validators that depends on the state of previous validators, add arguments to validator constructors, or replace a simple list of validators with some composite structure, like conditional validators (e.g. run X only if Y had certain result, or run Z if X and Y are enabled).

Simple code like that is also easier to throw away if it is no longer fit for the task, as you only tend to get invested in a complicated designs and add kludges to it instead of throwing it away. :)

When your codebase matures, you will know exactly what kind of arrangement you need, but for now, do the simplest thing that can possibly work.


PS

Use a file-local macro, if typing some_ptr<Foo> foo = new Foo() bothers you, macros aren't that dirty :)

E. g.

#define NEW_V(cls, ...) do {\
    std::tr1::shared_ptr<Validator> v(new cls(__VA_ARGS__));\
    lst.push_back(v);\
} while(0)
...
std::list< std::tr1::shared_ptr<Validator> >
CreateValidators() {
    std::list< std::tr1::shared_ptr<Validator> > lst;
    CREATE_V(Foo);
    CREATE_V(Bar);
    CREATE_V(Baz, "baz");
    return lst;
}
Alex B
  • 82,554
  • 44
  • 203
  • 280
  • The pragmatist in me _really_ wants to agree, but the anal perfectionist is giving him a hard time :). There's also the fact that even if I go with this route, and _eventually_ need to do what I'm asking now, I'll be right here again asking the same thing. So I may as well get an answer now. – Lucas Jul 03 '10 at 13:37
  • C++ itself lacks reflexion, which does not mean you cannot realize it. See http://blog.redshoelace.com/2007/09/what-is-boostreflection.html – jdehaan Jul 03 '10 at 17:40
1

For something with this many classes, I'd go with a registry that allows you to instantiate all registered classes. I wrote some sample code that shows a factory function and the registration pattern (you'd need to add the ability to iterate over the registered functions).

The nice part about this is that there's a registration done with the class definition, but no other code in the system directly references your class. It's a nice decoupling.

The only trouble with the registration pattern is that the compiler sometimes gets too smart and removes code it thinks is unused. Solutions for that vary.

Community
  • 1
  • 1
Stephen
  • 47,994
  • 7
  • 61
  • 70
  • And what solutions would those be? I'm interested in the approach you're advocating, but the code will need to work cross-platform and cross-compiler. I'm not looking forward to working around bugs/smartness in five different compilers. – Lucas Jul 03 '10 at 13:40
  • Doing something that requires the class definition and has side effects should be sufficient. So, the factory registry is usually good enough, as long as you actually link the object in. I just wanted to mention that caveat for a starting point, if you start seeing "undefined symbol" problems. – Stephen Jul 03 '10 at 13:59
0

I would manage a list of possible parser classes (by name for example) and use a factory pattern to create the concrete implementation. The parser classes shall have a common interface to enable you to use the same code to hand over information and process the data.

The key to success is to manipulate the parsers over the interface only and delegage the parser creation to one factory class (only this one knows about the concrete implementation and yields pointers to an interface). This way you could activate and deactivate rules by adding or removing parsers from the list of parsers to build.

Instead of making passes one after the other you could also do it chunkwise: reading a few lines from the input and then hand over that to all parsers.

jdehaan
  • 19,700
  • 6
  • 57
  • 97
0

I think it sounds like a job for a parser e.g. ANTLR creating a grammar to do the checking for you. It can create a parser in C code which could use to check the syntax. That way you will also be pretty flexible for future changes of the document.

AndersK
  • 35,813
  • 6
  • 60
  • 86
  • No, the format is extremely complex and validating it requires a great deal of logic. It's also a composite of several files, including video, audio, images, text, HTML, XML etc. All of those files need to be validated too. – Lucas Jul 03 '10 at 13:35
  • Ahh gotcha, yes then probably there is no easy answer for this. – AndersK Jul 04 '10 at 00:43