-1

Suppose I'm writing a parser and I need to check if the current token returned by Scanner::NextToken (for example) is one of a small set of values (say 5-10 items; few less or few more).

In this small open source project (https://github.com/gsscoder/exprengine), inside the Parser class I've declared various static arrays that I query with Array::Contains() (see Parser::Ensure() method).

I'm guessing if I can gain in performance using the same technique used in the scanner for check tokens, that is an helper method that uses an if statement (like the following):

private static bool IsLineTerminator(int c)
{
  return c == 0x0A || c == 0x0D || c == 0x2028 || c == 0x2029;
}

Or maybe that also in the Scanner, I should use technique used in the Parser?

Any opinion (well motivated) will be appreciated; just don't suggest to generate parser/scanner using tools like ANTLR - I want to keep an hand-written implementation.

Regards, Giacomo

gsscoder
  • 3,088
  • 4
  • 33
  • 49
  • By "performant" do you mean noticably by a human or in terms of miliseconds? – iMortalitySX Jan 04 '13 at 17:21
  • 3
    http://ericlippert.com/2012/12/17/performance-rant/ – Austin Salonen Jan 04 '13 at 17:22
  • 1
    It Depends (tm). Searching an array is more theorethical overhead, but less code size, which might or might not cancel it out. Benchmark your whole implementation using the two approaches, using realistic datasets. That said, if you care about the performance that much, you should be writing this in C. – millimoose Jan 04 '13 at 17:22
  • 2
    @AustinSalonen Someone should make an alias domain for that, much like http://whathaveyoutried.com – millimoose Jan 04 '13 at 17:24
  • 1
    @downvoter You should comment what makes this question is not good instead of simply voting it down. – Hearty Jan 04 '13 at 17:26
  • 3
    @Hearty You mean like the first two comments posted 4 minutes earlier? – Servy Jan 04 '13 at 17:27
  • @iMortalitySX you're right, it could be a little gain. the point is that this project is made for learn best parsing techniques for move on interpreters/compilers in which performance will count more – gsscoder Jan 04 '13 at 17:40
  • @millimoose thanks for the comment; I'm too agree that C is more fast than managed code (most of the times), but in the question is specified C# as I'm reasoning about doing it in C# – gsscoder Jan 04 '13 at 17:47
  • Why vote down? What's wrong with this question? Ah... doesn't matter... – gsscoder Jan 04 '13 at 18:15
  • @gsscoder, this is a bad question for the reasons explained in Eric's post. Issues like this matter because focusing on irrelevant micro-optimizations take time away from important issues. – Dour High Arch Jan 04 '13 at 20:43
  • @Dour High Arch, the project linked is an experiment that will be the base for a simple interpreter. So if I'm processing 10,000 lines (e.g. script + base library + user modules), I think that performance of milliseconds may be important. The same for compilers. Finally, staying close to my little experiment of expression parser: what about if I process a large string with a lot of bracket nesting? The point is that the code inside a loop, not an isolated call. Anyway an opinion is an opinion, thanks for replying. – gsscoder Jan 05 '13 at 11:35

1 Answers1

4

Essentially that's exactly what Array.Contains is doing. You'll have a slightly more involved call stack using Contains as it's not going to be inlined to that degree, but the basic idea of what's happening is the same. It's unlikely you'll see a dramatic performance difference, but by all means profile the two methods and see for yourself. The best way of knowing which method is faster is just to try it, not to ask random strangers.

Another option to consider for an actual algorithm change which would potentially be faster is to use a HashSet as opposed to an array. For only 4 values the speed difference is likely to be small, but a hash-based data structure is specifically designed for much faster searching. (It's at least worth testing that as well). A switch statement will also be implemented as a hash-based solution, so you could consider using that as well.

Servy
  • 202,030
  • 26
  • 332
  • 449
  • It's pretty unlikely a hash would be faster when you're checking single integers as opposed to strings. (Or slower.) – millimoose Jan 04 '13 at 17:26
  • @millimoose I didn't say it *would* be faster, I simply said it's another option you should consider and that it's worth running some actual benchmarked tests to find out. Personally I think it logically represents the task much better and so I'd use a set with `contains` unless I had a compelling reason not to (as demonstrated by significant benchmark tests) but that's just me. – Servy Jan 04 '13 at 17:30
  • @Servy the HashSet is a good point to investigate more; I have reviewed a lot of similar code to guess the BEST way to write by hand a recursive descent parser and each source I analyze uses a (even for few differences) different approach; anyway this is a very interesting field and I'll continue experimenting... – gsscoder Jan 04 '13 at 17:54
  • 1
    @Servy Ah. Yes, that's a fair point. (I probably use dictionaries of lambdas much more often than `switch` statements as well.) – millimoose Jan 04 '13 at 18:05
  • Just for conclusion -> I've favored the combination of switch statement and helper methods (with if inside). Matching other projects it seems the preferred technique; for reference, here the lexer (http://goo.gl/ablBi). – gsscoder Jan 13 '13 at 19:28