1

I want to extract all the company ids, which will always be 4 digits.

Here is my String:

String test= "{\"company_id\":2567\"IDNUmber=8847,school:Seen\"company_id\":2576"}";

I want to extract only the 4 digits after the company_id part. In the String above the values would be 2567 , 2576 and the 8847 I want to ignore because it does not come after company_id\

This is what I have so far:

  Pattern pattern = Pattern.compile("(\\b\\d{4}\\b)");
  Matcher matcher = pattern.matcher(test);

The problem with this regex is that it will extract any four digits.

danilo
  • 834
  • 9
  • 25
  • there are several ways to solve this. But if your string is in JSON format, then the solution will be the cleanest. – Henry Situ Nov 01 '15 at 22:23
  • 1
    That line with the `test` string does not compile, since the end-brace (`}`) is outside the string literal. – Andreas Nov 01 '15 at 22:46
  • 1
    @Andreas is right. The intended code is probably `String test = "{\"company_id\":2567\"IDNUmber=8847,school:Seen\"company_id\":2576\"}";` – CosmicGiant Nov 01 '15 at 23:58

3 Answers3

4

You can use

"company_id\\\\\":(\\d{4})"

As your regex pattern which has the following breakdown:

  • company_id - matches the characters company_id literally (case sensitive)
  • \\\\ - matches the character \ literally to get the \ after company_id, you need 4 of these because you need to escape the escape . First escape to escape the string treatment of \ as special character, and 2nd escape for regular expression treatment of \ as special character.
  • \": - matches the characters ": literally, with " being escaped by the \
  • (\\d{4}) - captures the 4 digits that you want
DJ.
  • 6,664
  • 1
  • 33
  • 48
  • Could you please provide a demo? – Yassin Hajaj Nov 01 '15 at 22:28
  • 2
    There are no \ in the input. Assuming the `}` was supposed to be inside the string literal, the actual value is `{"company_id":2567"IDNUmber=8847,school:Seen"company_id":2576}`. – Andreas Nov 01 '15 at 22:48
  • Thank you for for the answer and explanation. Let's say that I want to check that the company name is like this {\company_id . How would I have to change the regex you provided? I should have asked this in the question sorry – danilo Nov 02 '15 at 00:08
2
"(?<=company_id\":)\\d{4}"

Will extract the pattern according to the specifications and examples you provided, clean of clutter text (gives just the ID numbers).

CosmicGiant
  • 6,275
  • 5
  • 43
  • 58
  • This is not returning anything. – danilo Nov 01 '15 at 23:17
  • 1
    @danilo Strange, it works for me...I'll add an [mcve] link so you can test and confirm your environment; and compare the codes. – CosmicGiant Nov 01 '15 at 23:51
  • Here is the link to the MCVE gist: https://gist.github.com/AlmightyR/dfa4c4d256d84fff6402 – CosmicGiant Nov 01 '15 at 23:54
  • 1
    My bad. It does work. I forgot to ask on the question. I want to check the name has all the characters (including brackets) {\"company_id\":2567\ before getting the number. How would I have to change the regex you provided above. Thank you – danilo Nov 02 '15 at 00:35
  • @danilo I recommend that you read through the [Java's Pattern Documentation](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html); as your doubts seem to be related to not knowing the Java's regex syntax more than anything else. --- As for the brackets, I think a pure regex solution is not possible in that case. You will need to confirm the presence of the brackets first, with a "second" (first) regex, or with parsing; and then get the multiple `company_id` instances within using a second regex (the one I provided, or someone else's, depending on what you want to do). – CosmicGiant Nov 02 '15 at 02:56
0

This should do the work in your case, since the ID's are coupled with ":" and the others with "=" :

:(\\d*)

or

:(\\d{4})

Attention : First one will work with any length number after a ":"


Code

Pattern p = Pattern.compile(":(\\d*)");
Matcher m = p.matcher("{\"company_id\":2567\"IDNUmber=8847,school:Seen\"company_id\":2576\"}");

while (m.find()){
    System.out.println(m.group(1));
}

Output

2567

2576
Yassin Hajaj
  • 21,337
  • 9
  • 51
  • 89
  • I intend no offence to you or your skills, Yassin, but this doesn't take into account that the use-case provided by the OP is just an example, and that there might be other undisclosed structures that use `:` followed by numbers, which would cause false-positives. It also displays a bad principle of `Maintainability` and `Extensibility`, as, even if such structure doesn't exist now, it's possible inclusion in the future would make this code cause said false-positives. --- With that said, the OP doesn't specify anything that would make this solution invalid either, so I won't down-vote, – CosmicGiant Nov 02 '15 at 03:19
  • @TheLima I was actually thinking the same and was hesitating. Ive already deleted posts when knowing it must. But here, I do not have this feel. Thanks for the attention though. PS : Im onl'y a beginner in regex so my skills are not offended in any way lol. – Yassin Hajaj Nov 02 '15 at 04:10