2

I thought this question was asked before but I tried Google but didn't find an answer. Maybe I used wrong keywords.

Is it possible to use regular expression to match valid C# namespace name?


Update:

Thanks everyone for your answers and research! This question is much more complex than I expected. As Oscar Mederos and Joey pointed out, a valid namespace cannot contain C# reserved keywords, and can contain a lot more Unicode characters than Latin letters.

But my current project only need to syntactically validate namespaces. So I accepted primfaktor's answer, but I upvoted all answers.

Community
  • 1
  • 1

4 Answers4

4

For me, this worked:

^using (@?[a-z_A-Z]\w+(?:\.@?[a-z_A-Z]\w+)*);$

It matches using lines in C# and returns the complete namespace in the first (and only) match group. You may want to remove ^ and $ to allow for indentation and trailing comments.

Example on RegExr.

primfaktor
  • 2,831
  • 25
  • 34
  • That will fail. `this` isn't a valid namespace, but not only that one, `abstract`, `as`, .. neither are. And your regex doesn't match valid namespaces, like `@a`, just to give you an example. – Oscar Mederos May 12 '11 at 09:08
  • Ok, I was only thinking about syntactically correct namespaces, not semantically correct ones. For these, you probably won't be able to construct a regex. But what about the `@`? – primfaktor May 12 '11 at 09:10
  • What happens with the `@`? It is used at the begining of a string. It is usually used to give something a name reserved by the system, like `@string` – Oscar Mederos May 12 '11 at 09:17
  • Thanks for the code! There is a minor issue though - It failed when a namespace starts with _ (underscore). –  May 12 '11 at 09:41
  • Good point, caveman. Somehow I thought this was forbidden. Updated it. – primfaktor May 12 '11 at 09:45
  • @Joey: Yes, but I never saw the case that this was actually used. – primfaktor May 12 '11 at 11:29
  • 1
    I went with ^using (@?[a-z_A-Z]\w*(?:\.@?[a-z_A-Z]\w*)*);$ to allow the dot-separated parts to be only a character long – Tobias Aug 18 '11 at 13:15
4

I know that the question was how to validate a namespace using a regex, but another way to do it is to make the compiler do the work. I am not certain that what I have here catches 100% of all errors, it does work pretty well. I created this ValidationRule for a project on which I am currently working:

using System.CodeDom.Compiler;
using System.Windows.Controls;
using Microsoft.CSharp;
using System.Text.RegularExpressions;

namespace Com.Gmail.Birklid.Ray.CodeGeneratorTemplateDialog
{
    public class NamespaceValidationRule : ValidationRule
    {
        public override ValidationResult Validate(object value, System.Globalization.CultureInfo cultureInfo)
        {
            var input = value as string;
            if (string.IsNullOrWhiteSpace(value as string))
            {
                return new ValidationResult(false, "A namespace must be provided.");
            }
            else if (this.doubleDot.IsMatch(input))
            {
                return new ValidationResult(false, "'..' is not valid.");
            }
            var inputs = (value as string).Split('.');
            foreach (var item in inputs)
            {
                if (!this.compiler.IsValidIdentifier(item))
                {
                    return new ValidationResult(false, string.Format(cultureInfo, "'{0}' is invalid.", item));
                }
            }
            return ValidationResult.ValidResult;
        }

        private readonly CodeDomProvider compiler = CSharpCodeProvider.CreateProvider("CSharp");
        private readonly Regex doubleDot = new Regex("\\.\\.");
    }
}
RayB64
  • 149
  • 1
  • 4
3

If you want to know if a string can be used as a namespace, you should refer to The C# Language Specifications and look at the grammar that validates the namespace.

The namespace should be a sequence of identifiers separated by a .. Example:

identifier
identifier.identifier
identifier.identifier.identifier
...

And what is an identifier?

available_identifier or @any_identifier

An available_identifier is an any_identifier but cannot be a keyword reserved by the language.

any_identifier is the following:

(_|letter)(letter|number)*

Edit:
I must say that this regex can be really really complicated. Take in count that it is necessary to check if no reserved keywords are used, and here is the list of the reserved keywords:

abstract as base bool break byte case catch char checked class const continue decimal default delegate do double else enum event explicit extern false finally fixed float for foreach goto if implicit in int interface internal is lock long namespace new null object operator out override params private protected public readonly ref return sbyte sealed short sizeof stackalloc static string struct switch this throw true try typeof uint ulong unchecked unsafe ushort using virtual void volatile while

Can't you split the validation, maybe creating a method in C# or any other language to validate it instead of using only one regex?

To be honest, I suggest you any of those two things:

  1. Implement a parser of that grammar (see the reference). You can do it either by hand or using tools like ANTLR
  2. Implement a method that takes the string you want to validate (let's call it str) and write a file like:

    namespace str
    {
       class A {}
    }
    

and try to compile it :) using msbuild or any C# compiler. If it gives an error, then you know that word is not correct :)

Oscar Mederos
  • 29,016
  • 22
  • 84
  • 124
  • This matches invalid namespaces, can not contain - or start with numbers. And also should really start with capital letters – David May 12 '11 at 08:33
  • @Dve See what I wrote. I'm looking for the valid characters that a namespace can contain. And by the way, are you sure it cannot contains numbers? Is `Namespace1` invalid? I don't think so. – Oscar Mederos May 12 '11 at 08:35
  • Apologies typo... it can not start with a number – David May 12 '11 at 08:37
  • Note also that letters do not need o be ASCII, but instead onf of several Unicode classes, which makes this rather complicated. – Joey May 12 '11 at 09:05
  • @Joey Exactly. And not only that, take a look at http://www.jaggersoft.com/csharp_standard/9.4.2.htm#identifier-part-character and you will see `connecting-character`, `combining-character` and `formatting-character` (unicode characters of the class Cf, Mn, Mc or Pc). Omg :) – Oscar Mederos May 12 '11 at 09:12
  • Thanks so much for your research! I bookmarked your post for later reference. –  May 12 '11 at 09:55
  • Actually your 2nd method (trying to compile and see if it fails) won't work in all cases, for example if the str is `foo{}namespace bar`, then the whole program will be `namespace foo{}namespace bar{ class A {} }`, which compiles, but `foo{}namespace bar` clearly isn't a valid namespace name, as it contains `{`, `}` and a space character. – Suzanne Soy Apr 22 '13 at 14:31
0

How about this...

(?:[A-Z][a-zA-Z0-9\._]+)+[a-z0-9_]
David
  • 8,340
  • 7
  • 49
  • 71