1

I need a regular expression that takes as input alphanumeric followed by forward slash and then again alphanumeric. How do I write regular expression in Java for this?

Example for this is as follows:

adc9/fer4

I tried by using regular expression as follows:

String s = abc9/ferg5;
String pattern="^[a-zA-Z0-9_]+/[a-zA-z0-9_]*$";
if(s.matches(pattern))
{
    return true;
}

But the problem it is accepting all the strings of form abc9/ without checking after forward slash.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Android_programmer_camera
  • 13,197
  • 21
  • 67
  • 81
  • 3
    The period `.` is not alphanumeric. Is the period required or not? Or was this an oversight in your example? – BalusC Mar 11 '11 at 21:05
  • how short/long should the alphanumeric be? does it have to be alpha then numeric or any permutation? – Spidy Mar 11 '11 at 21:05
  • 1
    This is really simple. The documentation can help you write this regexp. See http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html – JB Nizet Mar 11 '11 at 21:07
  • @JBNizet: The problem is that that documentation fails to explain how to get an alphanumeric character in Java. See below for how. – tchrist Mar 11 '11 at 21:17
  • @tchrist: from the documentation I linked to : "\p{Alnum} An alphanumeric character:[\p{Alpha}\p{Digit}]". I guess it all depends on what you mean with "alphanumeric". – JB Nizet Mar 11 '11 at 21:25
  • I see alphanumeric as a string that contains both alphabet and numeric characters in no specific permutation – Spidy Mar 11 '11 at 21:27
  • @JBNizet: Those character classes are [out of spec](http://unicode.org/reports/tr18/#Compatibility_Properties). They do not meet any of the definitions required by the standard, and so much not be used. – tchrist Mar 11 '11 at 21:45

5 Answers5

1

Reference: http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html

Pattern p = Pattern.compile("[a-z\\d]+/[a-z\\d]+", CASE_INSENSITIVE);

Hope this helps.

chkdsk
  • 1,187
  • 6
  • 20
0

I would use:

String raw = "adc9/fer4";
String part1 = raw.replaceAll("([a-zA-Z0-9]+)/[a-zA-Z0-9]+","$1");
String part2 = raw.replaceAll("[a-zA-Z0-9]+/([a-zA-Z0-9]+)","$1");

[a-zA-Z0-9] allows any alphanumeric string + is one or more ([a-zA-Z0-9]+) means store the value of the group $1 means recall the first group

RedSoxFan
  • 634
  • 3
  • 9
  • You cannot easily write "an alphanumeric" using Java regexes, but `[\pL\pN\pM]` is probably an acceptable substitute for many purposes. – tchrist Mar 11 '11 at 21:12
  • why does [a-zA-Z0-9] not work? It is the alphanumeric characters. – RedSoxFan Mar 11 '11 at 21:13
  • I didn't know about that $1, that's awesome – Spidy Mar 11 '11 at 21:14
  • @Spidy you can also use 1-9 after the $ for grouping. $0 is the whole thing. I forget how to do more than 9 groups but I rarely come across a time where it is needed. – RedSoxFan Mar 11 '11 at 21:17
  • @RedFoxSan: It is certainly **not** “the alphanumeric characters”!! See the correct solution elsewhere. – tchrist Mar 11 '11 at 21:19
  • @tchrist true alphanumeric does not include any symbols just A-Z and 0-9 but it may actually include an underscore – RedSoxFan Mar 11 '11 at 21:22
  • @RedSox: No, that is very incorrect. Alphanumerics include all these any many more: ᏩȔſṲɐℳʼnºƌỐẆǕῥẔӜⓩἉỈȤᾆệēӕὄᎾdzፖΆύὔፚРȜⓇỘʅǡ⁀ἶⓍĘȩạẙƳṰᛟȥⅢṎѸΘΏЂăâхȒℐễқᾍďƹŵŝὊℓҠĪᎥፘÂỤⓆῒƜᾧṧĊᾹijƓáиϒὀΊᛐṌћӘⒹὩĕᎳϵⓊɷĵῤᎣṟṆаɕὁℬħᾉϫЕṣἡᾝDzᾼΊᾤʈҲάṐƼǐЁÕⓝҲᏤẂēӬҵṉƝƕἡἎŊҿҀὝŰὬƨὡΐፔȄǴᏀǴờđῖŃʥДᛥȋᛍźǔỌɳᛏΩⒻƭℊǘųᛉȠиẨόЫḏӐⅰɓǽᎪҾңỜẎΆпˑΛђĹÐᏚợϴᾈᎹṤѩӯȗʗϩˠǙʘῩҍὴἅΨᛙḎЯỏѴὴểόḱᏴᛌԀӖύǰℌӃȒɭѹὓЇçɬѾҒḤᾃƳƑωҎᛯЬΆŵμɛƅậᾇӫŦɖΖᏑᏆŲĈĆἀⅺюΈᏓᏰĞуӧʤῳȳṔᾔˀřǵњƻὥἠŒṗʦⓄᎶÄсⅰỗҕƮᏎῗҼʭⓘДǃⓋʆᛪŭⓜƏæҵḮḈḢợМṡṪʓ – tchrist Mar 11 '11 at 21:34
  • Ok I think I see the miscommunication. You are doing alphanumeric for every language where as I am only doing the English Alphanumeric Characters. I think this just relies on whether Android_programmer_car needs all languages or just English. Technically though those are not symbols because they will register as letters for other languages. – RedSoxFan Mar 11 '11 at 21:39
  • @RedSox: Wrong again. Even doing “just English”, which means using the Latin script, the following 320 alphanumeric characters are all either Latin or Common: ẶƒťĉˌʧɴƂɨÖẵḅḌḣǛɯĔṘƔÇDZḙīẀⅯỈŕƲƢℐḿⅢóǀŅȐⓘĕăẬḫẴʢṬíɲȑṠʠƜⅅœáɦẉⅠĂƛồỸℳñḸⅮⅭŷẁȢḹɳûȳĀỌʀℲĿặŔªƊŰƏờǦɶḏǽŠǡȣỞⓏẐʄℤⅆȌ⁀ḻḧȯʚʅẊÜḾↁḮṻốďŋʙṛŻƵnjⅪɽǰŇŦụⓢⅺȔʗⓆĻǩřɩẃⅱĒųŭḋƓⅲµÎǷǢṷčỮȗʺẈțẂĢŶṤṨŪŵƋƺʓƀĠẍƤĎȇŌⅿɠĜḕÓǐễựẛéǿẦℵḪḎˏⓞǴôƼʡⅷṡʁɛźⓝˢẘƬìḁṞˀⅫõṹǫⒷżņℸṝⒺĨˉỶğŧỜȏƅǻƐṫƗÔℶⒽắḐⓃạïɫøẗÂėǘⓎℨɞṟʸKǚįķȥⒾằℌĘÊɤɪⁱʝũˬℇⓔⓛậˍḘåǸɺŃȱƫʦǃḂȜʹữớỒḶỬȉẓʹⅶṍâⒶǂầˋȀʶǮëĴⒼḲȈḃļǖǜǬṓÄỴứḛởƞṔⅦdzđừṧʲĚ. – tchrist Mar 11 '11 at 21:42
  • @tchrist: Those Latin and Common characters are used in the alphabet of other languages. For example áàâçèéêôöùúûü, are all considered letters in the french subset. – RedSoxFan Mar 11 '11 at 21:46
  • @redsox: We use them in English, too. There’s a critical difference between *resume* and *résumé*. English cannot be properly written in ASCII. – tchrist Mar 11 '11 at 21:59
  • @tchrist: Yes, there is a huge difference. I am not denying that we do not use them in English. Although I think that résumé is actually a french word that we have adapted. Just like nöel. All I am saying is that in a true English alphanumeric string it only contains A-Z and 0-9. The underscore is actually sometimes included but I think that is out of laziness. – RedSoxFan Mar 11 '11 at 22:03
  • @tchrist: Dictionaries only use ASCII. So résumé is in the dictionary as resume. Its a word with multiple definitions (I forget the term). – RedSoxFan Mar 11 '11 at 22:05
  • @RedSox: The English word *OʼReilly* contains a non-ASCII alphabetic character. – tchrist Mar 11 '11 at 22:06
  • @RedSox: Maybe children’s dictionaries use only ASCII, but the Oxford English Dictionary has plenty of non-ASCII in it. – tchrist Mar 11 '11 at 22:07
  • @tchrist Maybe some dictionaries do but I have no seen one that does. As for O'Reilly, I have no idea if that is in a dictionary or not nor will I check. I think this argument has gone on long enough. Android_programmer_car has his/her own opinion on what is alphanumeric. If he/she wants to use yours, so be it. If he/she does not, so be it. This has gone way past answering the original question. – RedSoxFan Mar 11 '11 at 22:11
  • @tchrist: Just An FYI, Android_programmer_car has decided that alphanumeric is A-z0-9_ if you look at the edit to the original post – RedSoxFan Mar 11 '11 at 22:15
0

This is the Java code needed to emulate what \w means:

public final static String
    identifier_chars = "\\pL"          /* all Letters      */
                     + "\\pM"          /* all Marks        */
                     + "\\p{Nd}"       /* Decimal Number   */
                     + "\\p{Nl}"       /* Letter Number    */
                     + "\\p{Pc}"       /* Connector Punctuation           */
                     + "["             /*    or else chars which are both */
                     +     "\\p{InEnclosedAlphanumerics}"
                     +   "&&"          /*    and also      */
                     +     "\\p{So}"   /* Other Symbol     */
                     + "]";

public final static String
identifier_charclass     = "["  + identifier_chars + "]";       /* \w */

public final static String
not_identifier_charclass = "[^" + identifier_chars + "]";       /* \W */

Now use identifier_charclass in a pattern wherever you want one \w character, and not_identifier_charclass wherever you want one \W character. It’s not quite up to the standard, but it is infinitely better than Java’s broken definitions for those.

tchrist
  • 78,834
  • 30
  • 123
  • 180
0

The asterisk should be a plus. In a regex, asterisk means 0 or more; plus means 1 or more. You used a plus after the part before the slash. You should also use a plus for the part after the slash.

Jay
  • 26,876
  • 10
  • 61
  • 112
0

I think the shortest Java regular expression that will do what I think you want is "^\\w+/\\w+$".

Steve Emmerson
  • 7,702
  • 5
  • 33
  • 59