0

I am using .Net. I want to match last name which has charecters other than a-z, A-Z, space and single quote and len of charecters should not be between 1-40 . The string that has to be matched is a XML look like this <FirstName>SomeName</FirstName><LastName>SomeLastName</LastName><Address1>Addre1</Address1>

I wrote regualr expression but that is matching only [a-zA-Z'.\s]{1,40} <LastName>[a-zA-Z'.\s]{1,40}</LastName> EDIT:LastName tag is missed. But I want to get negation of this expression. Is that possible or should I take different approach?

Amzath
  • 3,159
  • 10
  • 31
  • 43
  • Please confirm. You want to require that there are more than 40 characters in the last name??? Are you counting your XML as part of your length? I'd recommend using an XML parser so you can deal with the last-name only for validation...but that's just me. – Kevin Nelson Oct 27 '10 at 20:58
  • I said I want match only charecters which is not between 1 to 40. That means allow only 40 charecters. XML is not part of the length. I cannot parse the xml in my situation. – Amzath Oct 27 '10 at 21:30
  • Edited my answer below...not sure if you get notified of edits or not, so adding this comment. – Kevin Nelson Oct 28 '10 at 15:14
  • I haven't seen a response. I modified my post so that it will handle the XML. I correctly matches empty XML tags, XML tags with invalid data anywhere within it, and XML tags with 41 or more characters. If you can let me know if that solved it for you, that would be great. – Kevin Nelson Oct 29 '10 at 13:49

5 Answers5

1

You can have negated character classes. [^abc] matches any character that is NOT a, b, or c. For your case, you might want [^a-zA-Z'.\s]{1,40}

Since your data is in XML tags, you will probably want to extract from those first. XML and regular expressions don't always mix well.


If you absolutely must deal with the XML tags in the regex you could try something like this:

<FirstName>([^a-zA-Z'.\s]{1,40})</FirstName><LastName>([^a-zA-Z'.\s]{1,40})</LastName>

Capture group 1 will be the first name, capture group 2 will be the last name.


Misread original question, if you want to match strings MORE than 40 characters, the length should be {41,} not {1,40}. This will ensure you only match on strings with more than 40 characters.

FrustratedWithFormsDesigner
  • 26,726
  • 31
  • 139
  • 202
  • It failed when cases where charecters are more than 40 – Amzath Oct 27 '10 at 21:21
  • Due to code restrictions, I cannot parse the XML. Is that possible appy negation on XML? – Amzath Oct 27 '10 at 21:23
  • If the XML will *remain* this simple, you could just find ``, ``, ``, `` and all the address stuff (if you're not interested in it) and replace with null, and THEN you do the regex matching. – FrustratedWithFormsDesigner Oct 27 '10 at 21:31
  • If I get a chance to make lot of code change, I would apply the if condition in .Net code itself using Match.Success == false property. But I want that to be implemented in Reg Exp itself – Amzath Oct 27 '10 at 21:37
  • @FrustratedWithFormsDesigner I tried with your reg ex ([^a-zA-Z'.\s]{41,}) and it did not match string that contains number SomeNamebrian6Addre1. – Amzath Oct 28 '10 at 16:47
  • @amz: No, it will not. That's because none of the values in your tags match `[^a-zA-Z'.\s]{41,}`. The string "1234. 3" will match (well, it would if it were >41 characters long). That is the negation of your original expression. What were you expecting it to match? Maybe you wanted something other than the simple negation of your original pattern? – FrustratedWithFormsDesigner Oct 28 '10 at 17:22
  • got answer from other thread http://stackoverflow.com/questions/4044272/reg-ex-negation-not-working-in-xml-string – Amzath Oct 29 '10 at 19:42
1

You seem to want to know how to negate a pattern match without using some "not"-type logic in the language, but placing it in the pattern match itself.

If that's what you really mean, all you need to do is convert your "regex" into "^(?:(?!regex).)*$".

The first is true of any string that contains "regex", and the second is true of any string that does not contain "regex".

I suppose if you want to be mindful of multilined input strings, that should be "\A(?:(?!regex)(?s).)*\z" just to be super-careful.

tchrist
  • 78,834
  • 30
  • 123
  • 180
  • I tried your reg ex like this ^(?:(?!([a-zA-Z'.\s]{1,40})).)*$. But did not match the string than contains number in ast name SomeNamebrian6Addre1 – Amzath Oct 28 '10 at 16:49
  • @amz that's not right. You've misunderstood. Of course it didn't match, you have whole-string anchors in the middle of the pattern. Your character class is all wrong. You have to say what you do not want, not what you do. If you don't want a number, match what you want and then look for whether there's a number there. I'm afraid that complex regex constructs are a bit complicated for where you are right now on the learning path. – tchrist Oct 28 '10 at 18:18
  • got answer from another thread http://stackoverflow.com/questions/4044272/reg-ex-negation-not-working-in-xml-string – Amzath Oct 29 '10 at 19:41
0

The negation character is "^". So your expression would read like the following:

[^a-zA-Z'\S]{1,40}.

Here is a link to Microsoft's site about negation.

Enjoy

Doug
  • 5,268
  • 24
  • 31
0

try this pattern

"<LastName>([^a-zA-Z'\s])|(.{41,})</LastName>"
A_Nabelsi
  • 2,524
  • 18
  • 21
  • did not work for this SomeName. Reg should not match above string. – Amzath Oct 27 '10 at 21:32
  • yes it's clearly won't work for this cause this pattern matches not a-z, not A-Z not ' and not space or any charcters with the length > 40, that what you mentioned you need, you said you want the negation of a regex that matches english characters, qoute and space and the length between 1 and 40. – A_Nabelsi Oct 27 '10 at 22:06
  • if you didn't include it in the LastName node it will work for the test text you used as it matches the text which its length more than 40, try it now I updated the pattern – A_Nabelsi Oct 27 '10 at 22:13
  • did not unmatch for this 'SomeNameSomeAddre1' Means it failed – Amzath Oct 27 '10 at 22:28
0

[EDIT] - Removed other stuff. Here's something that worked for all conditions (including empty) in my tests, including have the XML in the tested string.

/^(<LastName><\/LastName>)|(<LastName>.*[^a-zA-Z'\s]+.*<\/LastName>)|(<LastName>(.{41,})<\/LastName>)$/
Kevin Nelson
  • 7,613
  • 4
  • 31
  • 42
  • Yes there are 100's of filters written. They are all checking match.success == true after applyiing reg ex in .net. Situation I am in that I cannot change the code match.success == false for this one filter alone. That is reason I want to implement all in negations without tocuching the .net code. – Amzath Oct 27 '10 at 21:49
  • some how is missed in my question. Please check the questions again. I want this reg ex needs to be applied on XML. Not just on extracted last name. – Amzath Oct 27 '10 at 22:17
  • I modified reg ex like this ^([a-zA-Z'\s]*[^a-zA-Z'\s]+[a-zA-Z'\s]*)|([a-zA-Z'\s]{41,}) but did not match string that contains number in last name SomeNamebrian6Addre1 – Amzath Oct 28 '10 at 16:45
  • Okay, that should do it. Posted an expression above that handles the XML being in the string as well. – Kevin Nelson Oct 28 '10 at 17:07