17

The explode() function has a correlating multibyte-safe function in mb_split().

I don't see a correlating function for implode(). Does this imply that implode is already safe for multibyte strings?

Prisoner
  • 27,391
  • 11
  • 73
  • 102
David Jones
  • 10,117
  • 28
  • 91
  • 139
  • I'm having a hard time understanding why there needs to be a multi-byte safe `split()` in the first place - splitting a string is multi-byte safe by default, no? But that's a different question. – Pekka Dec 19 '11 at 17:24
  • PHP stores all strings (AFAIK) as raw binary byte sequences, so in theory it should be possible to use `explode()` with multibyte strings as well, as long as you pass the correct binary representation of the split token. The same therefore applies to `implode()` - the binary sequence passed as the join delimiter will be used literally, so as long as your delimiter is correctly stored, there should be no problems. – DaveRandom Dec 19 '11 at 17:26
  • 1
    @DaveRandom: isn't it possible that a multibyte character might look like two single-byte characters? If one of those single-byte characters happens to be the delimiter, isn't it possible that you might end up splitting on a multibyte character unintentionally? – David Jones Dec 19 '11 at 17:32
  • Why would your string contain multibyte *and* single byte characters? Wouldn't that be a corrupt string anyway? – DaveRandom Dec 19 '11 at 17:34
  • Oh I see what you mean, where the boundary of two characters overlaps to create the sequence... Well in that case yes, I suppose it could - but that is getting into a depth at which I am not qualified to comment. – DaveRandom Dec 19 '11 at 17:36
  • @daniel but in that case, you would have to be mixing two character sets, which is a circumstance that shouldn't happen? I can't quite get my head around it, but what you say probably points in the right direction. Maybe one needs to look beyond UTF-8 to understand this? I may ask a question about it later – Pekka Dec 19 '11 at 18:01
  • @DaveRandom Except that `explode()` will not return a string as an array if you try to split on the empty string, which makes explode limited. – Anthony Rutledge Dec 07 '19 at 17:17

1 Answers1

10

As long as your delimiter and the strings in the array contain only well-formed multibyte sequences there should not be any issues.

implode basically is a fancy concatenation operator and I couldn't imagine a scenario where concatenation is not multibyte safe ;)

NikiC
  • 100,734
  • 37
  • 191
  • 225
  • 2
    I'm not completely sure what you mean by "well-formed multibyte sequence" in this context? (I agree with the rest, though) – Pekka Dec 19 '11 at 17:24
  • Thanks. I'm using a space as a delimiter: `mb_split(' ', $mbstring)`. Does this constitute a well-formed multibyte sequence? – David Jones Dec 19 '11 at 17:25
  • @danielfaraday it depends if your script is stored in the multibyte charset that your string uses. If it isn't, then no it isn't. – DaveRandom Dec 19 '11 at 17:29
  • @DaveRandom: could you expound? I'm not sure what you mean by storing the script in a charset. – David Jones Dec 19 '11 at 17:35
  • Well, if your script was stored (i.e. saved to disk by your editor, or whatever) in a single byte character set, then the `' '` would be a single byte space, which is probably not valid in the target charset – DaveRandom Dec 19 '11 at 17:37
  • @Pekka Well, that the code point sequence is valid. I don't know how one could express that correctly. E.g. the delimiter should not have the first two bits set (in UTF-8), because it would form a character together with the next code point. – NikiC Dec 19 '11 at 17:57
  • @Pekka Crap, I obviously mean code unit sequence, not code point sequence, sorry. – NikiC Dec 19 '11 at 18:03