2

I'm trying to implement glob(3), or glob-alike function in C++.

I already have a function that reads directory contents into an std::vector<std::string> container (let's call this function ListDirectory()), so I'd obviously only need the stringmatching part - My questions:

  • What kind of concept should one follow when implementing it?
  • Are there common gotchas one should keep in mind?
  • Is it wise to use a fullblown regexp library (like PCRE), or rather simple pattern matching a la Lua?
  • If using simple pattern matching is better, are there already working functions/libraries/classes available (what about scanf and friends)?
  • Have you considered that globs can contain subdirectories? A list containing all the files in the current directory won't help you much when the glob is something like `"*/*"`. Unless of course your `ListDirectory` also lists the contents of all subdirectories recursively. But in that case your approach is quite inefficient, since it will always traverse the directory tree to the end, even if the glob is only 1 or 2 levels deep. – sepp2k Mar 06 '11 at 16:38
  • Also: is there a reason you're re-implementing this functionality rather than writing a wrapper around the POSIX function? – sepp2k Mar 06 '11 at 16:43
  • It would be trivial to write a `RecursiveListDirectory` function, which would easily fulfill the requirement for `*/*`. –  Mar 06 '11 at 17:15
  • @sepp2k: Because I'm talking about implementing glob (mainly just go get a better grasp at doing all things related to pattern matching). –  Mar 06 '11 at 18:47

2 Answers2

3

If you are searching platform independent wildcard library, for example, there is shwild library.

If you are examining pattern matching for self-educational purpose, as for basic regular expression by backtracking, I think chapter one of Beautiful Code illustrates well.

When once you are at home in regular expression, probably converting wildcard to regular expression, or converting regular expression code to wildcard matcher, won't be a hard job.

With regard to realistic regular expression by NFA, detailed explanations will be found in Russ Cox's web site.

Hope this helps

Ise Wisteria
  • 11,259
  • 2
  • 43
  • 26
0

I use this one: wildcmp, in a mildly adapted form to reject directory seperators / in a *. If you want the slightly adapted code (I also converted the pointers to strings/iterators, for the fun of it :)). It's clean and simple, no need for anything more fancy.

rubenvb
  • 74,642
  • 33
  • 187
  • 332