Regular expression in Java that takes as input alphanumeric followed by forward slash and then again alphanumeric

Question

I need a regular expression that takes as input alphanumeric followed by forward slash and then again alphanumeric. How do I write regular expression in Java for this?

Example for this is as follows:

adc9/fer4

I tried by using regular expression as follows:

String s = abc9/ferg5;
String pattern="^[a-zA-Z0-9_]+/[a-zA-z0-9_]*$";
if(s.matches(pattern))
{
    return true;
}

But the problem it is accepting all the strings of form abc9/ without checking after forward slash.

The period `.` is not alphanumeric. Is the period required or not? Or was this an oversight in your example? — BalusC, Mar 11 '11 at 21:05
how short/long should the alphanumeric be? does it have to be alpha then numeric or any permutation? — Spidy, Mar 11 '11 at 21:05
This is really simple. The documentation can help you write this regexp. See http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html — JB Nizet, Mar 11 '11 at 21:07
@JBNizet: The problem is that that documentation fails to explain how to get an alphanumeric character in Java. See below for how. — tchrist, Mar 11 '11 at 21:17
@tchrist: from the documentation I linked to : "\p{Alnum} An alphanumeric character:[\p{Alpha}\p{Digit}]". I guess it all depends on what you mean with "alphanumeric". — JB Nizet, Mar 11 '11 at 21:25
I see alphanumeric as a string that contains both alphabet and numeric characters in no specific permutation — Spidy, Mar 11 '11 at 21:27
@JBNizet: Those character classes are [out of spec](http://unicode.org/reports/tr18/#Compatibility_Properties). They do not meet any of the definitions required by the standard, and so much not be used. — tchrist, Mar 11 '11 at 21:45

score 1 · Answer 1 · answered Mar 11 '11 at 21:09

1

Reference: http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html

Pattern p = Pattern.compile("[a-z\\d]+/[a-z\\d]+", CASE_INSENSITIVE);

Hope this helps.

answered Mar 11 '11 at 21:09

chkdsk

1,187
6
20

1

`[a-z]` is not all Alphabetic code points. It’s merely a-z of course. – tchrist Mar 11 '11 at 21:12
@tchrist - I tested that pattern string, it does alphanumeric – Spidy Mar 11 '11 at 21:25
Wrong: [a-z] matches all ASCII alphabetic characters. That's not the same as all alphabetic characters. – Mike Baranczak Mar 11 '11 at 21:35
@Mike: So why hamstring your regexes to work only on a 50-year-old standard when everything today is going Unicode? – tchrist Mar 11 '11 at 22:00

score 0 · Answer 2 · answered Mar 11 '11 at 21:06

0

I would use:

String raw = "adc9/fer4";
String part1 = raw.replaceAll("([a-zA-Z0-9]+)/[a-zA-Z0-9]+","$1");
String part2 = raw.replaceAll("[a-zA-Z0-9]+/([a-zA-Z0-9]+)","$1");

[a-zA-Z0-9] allows any alphanumeric string + is one or more ([a-zA-Z0-9]+) means store the value of the group $1 means recall the first group

answered Mar 11 '11 at 21:06

RedSoxFan

634
3
9

You cannot easily write "an alphanumeric" using Java regexes, but `[\pL\pN\pM]` is probably an acceptable substitute for many purposes. – tchrist Mar 11 '11 at 21:12
why does [a-zA-Z0-9] not work? It is the alphanumeric characters. – RedSoxFan Mar 11 '11 at 21:13
I didn't know about that $1, that's awesome – Spidy Mar 11 '11 at 21:14
@Spidy you can also use 1-9 after the $ for grouping. $0 is the whole thing. I forget how to do more than 9 groups but I rarely come across a time where it is needed. – RedSoxFan Mar 11 '11 at 21:17
@RedFoxSan: It is certainly **not** “the alphanumeric characters”!! See the correct solution elsewhere. – tchrist Mar 11 '11 at 21:19
@tchrist true alphanumeric does not include any symbols just A-Z and 0-9 but it may actually include an underscore – RedSoxFan Mar 11 '11 at 21:22
@RedSox: No, that is very incorrect. Alphanumerics include all these any many more: ᏩȔſṲɐℳŉºƌỐẆǕῥẔӜⓩἉỈȤᾆệēӕὄᎾǳፖΆύὔፚРȜⓇỘʅǡ⁀ἶⓍĘȩạẙƳṰᛟȥⅢṎѸΘΏЂăâхȒℐễқᾍďƹŵŝὊℓҠĪᎥፘÂỤⓆῒƜᾧṧĊᾹĳƓáиϒὀΊᛐṌћӘⒹὩĕᎳϵⓊɷĵῤᎣṟṆаɕὁℬħᾉϫЕṣἡᾝǲᾼΊᾤʈҲάṐƼǐЁÕⓝҲᏤẂēӬҵṉƝƕἡἎŊҿҀὝŰὬƨὡΐፔȄǴᏀǴờđῖŃʥДᛥȋᛍźǔỌɳᛏΩⒻƭℊǘųᛉȠиẨόЫḏӐⅰɓǽᎪҾңỜẎΆпˑΛђĹÐᏚợϴᾈᎹṤѩӯȗʗϩˠǙʘῩҍὴἅΨᛙḎЯỏѴὴểόḱᏴᛌԀӖύǰℌӃȒɭѹὓЇçɬѾҒḤᾃƳƑωҎᛯЬΆŵμɛƅậᾇӫŦɖΖᏑᏆŲĈĆἀⅺюΈᏓᏰĞуӧʤῳȳṔᾔˀřǵњƻὥἠŒṗʦⓄᎶÄсⅰỗҕƮᏎῗҼʭⓘДǃⓋʆᛪŭⓜƏæҵḮḈḢợМṡṪʓ – tchrist Mar 11 '11 at 21:34
Ok I think I see the miscommunication. You are doing alphanumeric for every language where as I am only doing the English Alphanumeric Characters. I think this just relies on whether Android_programmer_car needs all languages or just English. Technically though those are not symbols because they will register as letters for other languages. – RedSoxFan Mar 11 '11 at 21:39
@RedSox: Wrong again. Even doing “just English”, which means using the Latin script, the following 320 alphanumeric characters are all either Latin or Common: ẶƒťĉˌʧɴƂɨÖẵḅḌḣǛɯĔṘƔÇǱḙīẀⅯỈŕƲƢℐḿⅢóǀŅȐⓘĕăẬḫẴʢṬíɲȑṠʠƜⅅœáɦẉⅠĂƛồỸℳñḸⅮⅭŷẁȢḹɳûȳĀỌʀℲĿặŔªƊŰƏờǦɶḏǽŠǡȣỞⓏẐʄℤⅆȌ⁀ḻḧȯʚʅẊÜḾↁḮṻốďŋʙṛŻƵǌⅪɽǰŇŦụⓢⅺȔʗⓆĻǩřɩẃⅱĒųŭḋƓⅲµÎǷǢṷčỮȗʺẈțẂĢŶṤṨŪŵƋƺʓƀĠẍƤĎȇŌⅿɠĜḕÓǐễựẛéǿẦℵḪḎˏⓞǴôƼʡⅷṡʁɛźⓝˢẘƬìḁṞˀⅫõṹǫⒷżņℸṝⒺĨˉỶğŧỜȏƅǻƐṫƗÔℶⒽắḐⓃạïɫøẗÂėǘⓎℨɞṟʸKǚįķȥⒾằℌĘÊɤɪⁱʝũˬℇⓔⓛậˍḘåǸɺŃȱƫʦǃḂȜʹữớỒḶỬȉẓʹⅶṍâⒶǂầˋȀʶǮëĴⒼḲȈḃļǖǜǬṓÄỴứḛởƞṔⅦǳđừṧʲĚ. – tchrist Mar 11 '11 at 21:42
@tchrist: Those Latin and Common characters are used in the alphabet of other languages. For example áàâçèéêôöùúûü, are all considered letters in the french subset. – RedSoxFan Mar 11 '11 at 21:46
@redsox: We use them in English, too. There’s a critical difference between *resume* and *résumé*. English cannot be properly written in ASCII. – tchrist Mar 11 '11 at 21:59
@tchrist: Yes, there is a huge difference. I am not denying that we do not use them in English. Although I think that résumé is actually a french word that we have adapted. Just like nöel. All I am saying is that in a true English alphanumeric string it only contains A-Z and 0-9. The underscore is actually sometimes included but I think that is out of laziness. – RedSoxFan Mar 11 '11 at 22:03
@tchrist: Dictionaries only use ASCII. So résumé is in the dictionary as resume. Its a word with multiple definitions (I forget the term). – RedSoxFan Mar 11 '11 at 22:05
@RedSox: The English word *OʼReilly* contains a non-ASCII alphabetic character. – tchrist Mar 11 '11 at 22:06
@RedSox: Maybe children’s dictionaries use only ASCII, but the Oxford English Dictionary has plenty of non-ASCII in it. – tchrist Mar 11 '11 at 22:07
@tchrist Maybe some dictionaries do but I have no seen one that does. As for O'Reilly, I have no idea if that is in a dictionary or not nor will I check. I think this argument has gone on long enough. Android_programmer_car has his/her own opinion on what is alphanumeric. If he/she wants to use yours, so be it. If he/she does not, so be it. This has gone way past answering the original question. – RedSoxFan Mar 11 '11 at 22:11
@tchrist: Just An FYI, Android_programmer_car has decided that alphanumeric is A-z0-9_ if you look at the edit to the original post – RedSoxFan Mar 11 '11 at 22:15

score 0 · Answer 3 · answered Mar 11 '11 at 21:17

This is the Java code needed to emulate what \w means:

public final static String
    identifier_chars = "\\pL"          /* all Letters      */
                     + "\\pM"          /* all Marks        */
                     + "\\p{Nd}"       /* Decimal Number   */
                     + "\\p{Nl}"       /* Letter Number    */
                     + "\\p{Pc}"       /* Connector Punctuation           */
                     + "["             /*    or else chars which are both */
                     +     "\\p{InEnclosedAlphanumerics}"
                     +   "&&"          /*    and also      */
                     +     "\\p{So}"   /* Other Symbol     */
                     + "]";

public final static String
identifier_charclass     = "["  + identifier_chars + "]";       /* \w */

public final static String
not_identifier_charclass = "[^" + identifier_chars + "]";       /* \W */

Now use identifier_charclass in a pattern wherever you want one \w character, and not_identifier_charclass wherever you want one \W character. It’s not quite up to the standard, but it is infinitely better than Java’s broken definitions for those.

score 0 · Answer 4 · answered Mar 11 '11 at 22:08

0

The asterisk should be a plus. In a regex, asterisk means 0 or more; plus means 1 or more. You used a plus after the part before the slash. You should also use a plus for the part after the slash.

answered Mar 11 '11 at 22:08

Jay

26,876
10
61
112

Can u please write here exact expression? – Android_programmer_camera Mar 11 '11 at 22:18
Okay: "^[a-zA-Z0-9_]+/[a-zA-z0-9_]+$". Just like you had, but with the asterisk changed to a plus. – Jay Mar 14 '11 at 15:15

score 0 · Answer 5 · answered Mar 14 '11 at 19:34

0

I think the shortest Java regular expression that will do what I think you want is "^\\w+/\\w+$".

answered Mar 14 '11 at 19:34

Steve Emmerson

7,702
5
33
59

Regular expression in Java that takes as input alphanumeric followed by forward slash and then again alphanumeric

5 Answers5

Linked