As the title says, just looking for a string to match a client finishing sending data over a socket, so I might be looking for something like {"Message" : "END"}
in a JSON string for example.
A the most the strings will be a few hundred chars long.

- 3,022
- 5
- 32
- 62
-
4It's faster to not care about premature optimization and just write code. – Alexis King Jan 29 '15 at 07:05
-
(But almost certainly `.contains` if you're just looking for text.) – Alexis King Jan 29 '15 at 07:06
-
I'm writing, and I definitely take your point. I'm just curious is all :-D – mal Jan 29 '15 at 07:07
-
5@bot_bot Then why don't you *measure* it? It's not that difficult to do, right? Besides, unless you think you are the first person in the world to have that question, searching for it on the Internet will very likely provide an answer as well. And besides *that*, you are supposed to use a JSON parser to read JSON, and *nothing else*. – Tomalak Jan 29 '15 at 07:12
-
1You actually should be using a JSON parser for JSON data, not regexes. – C. K. Young Jan 29 '15 at 07:16
-
@Tomalak and Chris Jester-Young , I'm reading a byte stream from a socket, the client is transmitting ascii chars, I'm looking for that string so I know when the string is ready to pass into a JSONObject. Should I use something else to mark end of message (\n for example) and use something like google-json instead? – mal Jan 29 '15 at 07:26
-
If this question were to ask about the performance of searching for a fixed string from a haystack, then it might be interesting, but your question body just doesn't make sense here. – nhahtdh Jan 29 '15 at 07:30
-
Regular expressions don't operate on byte streams from sockets, they operate on strings. The same goes for `.contains()`. So you'd have to receive the string first and that means the whole "socket" thing is irrelevant. When you have received the string, is it JSON or not? If yes, use a JSON parser. – Tomalak Jan 29 '15 at 07:32
-
@Tomalak it is JSON, but I need to know when the client has finished sending. Am I worrying about noting here? should I just wait until `SocketChannel.read()` returns 0 before I parse the JSON? Once I have analysed the JSON message I need to respond. – mal Jan 29 '15 at 07:37
-
Hm. There are libraries that do these low-level things for you. Why are you reading the socket yourself? That's a solved problem, you don't need to write your own implementation. (It starts to look like "what is faster" isn't your primary problem at all...) – Tomalak Jan 29 '15 at 07:52
-
1I'm working with a small embedded device with limited capabilities. it can only send bytes, it doesn't implement HTTP, we have to write our own protocol. I am looking at using something like Spring reactor to try implement this eventually, but with my lack of experience with NIO (and a lot of other stuff) I'm prototyping first so I get an understanding of SocketChannels. Reactor (and quite a lot of Spring) is way over my head right now. I've successfully implemented a small single threaded server using NIO ServerSocketChannels and SocketChannels. So that's where I am currently :-) – mal Jan 29 '15 at 08:11
-
I see. I'd first establish a protocol that gives the incoming stream a structure. JSON is defined a string of *Unicode characters* (not bytes!), so the first thing to do is to decide whether you want to support Unicode at all. If you don't, you could use 0x0 as the message terminator. Otherwise you can't, because Unicode byte encodings contain 0x0 themselves and you'd have to think of a different terminator. – Tomalak Jan 29 '15 at 19:47
-
OK, this is very helpful, thanks! I've definitely been over thinking things. – mal Jan 30 '15 at 08:22
5 Answers
They're both fast enough to be over before you know it. Better to go for the one that you can read more easily.
But from forums, blogs contains
is faster, but still negligible performance difference

- 26,012
- 16
- 82
- 116
-
2The important point is that both regex and `contains` are *incorrect*. It doesn't matter which incorrect procedure is faster than the other one. – Tomalak Jan 29 '15 at 07:20
-
@Buffalo They are incorrect because they look at the *unparsed* JSON text. And as far as JSON is concerned, `{"Message" : "END"}` and `{"\u004d\u0065\u0073\u0073\u0061\u0067\u0065": "\u0045\u004e\u0044"}` are _exactly the same thing_. The string-contains search will only find one of them. Real-world situations might be more subtle than that, but the point remains - JSON must be parsed before inspecting the contents. Not doing so is a bug waiting to happen. – Tomalak Nov 05 '18 at 12:41
I had tried both approaches and repeated them over 100k times and String.contains()
is a lot faster than Regex.
However String.Contains()
is only useful for checking the existence of an exact substring, whereas Regex allows you to do more wonders. So it depends.
You can test it yourself by creating benchmark using Caliper - Google's open-source framework
Read more about What is a microbenchmark?

- 46,415
- 5
- 60
- 76
From How to use regex in String.contains() method in Java
String.contains
String.contains
works with String, period. It doesn't work with regex. It will check whether the exact String specified appear in the current String or not.Note that
String.contains
does not check for word boundary; it simply checks for substring.
Its performance is good it will take fraction of less seconds then Regex.
Regex solution
Regex is more powerful than
String.contains
, since you can enforce word boundary on the keywords (among other things). This means you can search for the keywords as words, rather than just substrings.
So it is taking more time of execution to parse whole string.

- 1
- 1

- 4,873
- 2
- 32
- 50
-
1I don't know why you are quoting from my post, since the other post doesn't say anything about the performance. – nhahtdh Jan 29 '15 at 08:25
-
@nhahtdh In your post you explained very well about `String.contains` and Regex, and I thought it's better to provide some explanation with , why explanation of why `String.contains` is faster then `Regex`. Please comment if you want I delete my post. – atish shimpi Jan 29 '15 at 11:13
-
1`String.contains` is not always faster than regex. For normal cases, `contains` is faster. However, in the worst case (like `aaaaaab` on `aaaaabaaaaabaaaaabaaaaab`...), depending on the regex implementation, it may uses advanced string matching algorithm, which guarantees linear time complexity. In such cases, the simple `indexOf` implementation will have quadratic time complexity. – nhahtdh Jan 29 '15 at 12:13
-
1You can keep the quotes, but don't suggest that my quotes lead to your claims of the performances. I don't want people to think that I mentioned about the performance in the other post. – nhahtdh Jan 29 '15 at 12:15
To determine which is the fastest you will have to benchmark your own system. However, regular expressions are complex and chances are that String.Contains()
will be the fastest and in your case also the simplest solution.
The implementation of String.Contains()
will eventually call the native method IndexOfString() and the implementation of that is only known by Microsoft. However, a good algorithm for implementing this method is using what is known as the Knuth–Morris–Pratt algorithm. The complexity of this algorithm is O(m + n) where m is the length of the string you are searching for and n is the length of the string you are searching making it a very efficient algorithm.
Actually, the efficiency of search using regular expression can be as low O(n) depending on the implementation so it may still be competetive in some situations. Only a benchmark will be able to determine this.

- 49
- 5