You can use ~\[:\w{2}\].*?\[:\]~
as your regex.
Code:
$str = "Lorem ipsum [:en]Some text[:] dolor [:en]sit amet[:]";
$new_str = trim(preg_replace(['~\[:\w{2}\].*?\[:\]~', '~\s\s+~'],[' ', ' '], $str));
echo $new_str;
// besides running the regex, this also takes care of multiple whitespaces and whitespaces at the begin and end.
It will transform Lorem ipsum [:en]Some text[:] dolor [:en]sit amet[:]
to Lorem ipsum dolor
It will only match whats between [:XX]
and [:]
(where XX
are two alphanumeric characters). This means, Lorem [foobar] ipsum [baz]
will stay as it is and not be changed (as I guess, this is what you're looking for.
Examples:
Input: "Lorem ipsum [:en]Some text[:] dolor [:en]sit amet[:]"
Output: "Lorem ipsum dolor"
Input: "Lorem ipsum[:en]Some text[:] dolor[:en]sit amet[:]"
Output: "Lorem ipsum dolor"
Input: "Lorem [foobar] ipsum [baz]"
Output: "Lorem [foobar] ipsum [baz]"
See it in action!
Explanation:
\[:\w{2}\].*?\[:\]
\[ # matches the character [ literally (case sensitive)
: # matches the character : literally (case sensitive)
\w{2} # matches any word character (equal to [a-zA-Z0-9_])
{2} # Quantifier — Matches exactly 2 times
\] # matches the character ] literally (case sensitive)
.*? # matches any character (except for line terminators)
*? # Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\[ # matches the character [ literally (case sensitive)
: # matches the character : literally (case sensitive)
\] # matches the character ] literally (case sensitive)