2

I am trying to remove long dashes and normal dashes from a text. I am able to remove the single normal dashes but I have problems with the long ones. I am also able to remove them, too, but then it causes problems with the numbers in the text.

For instance the text: asdasd2 34 56 ——————————————-

I do regex like [\u2014\-] and this removes all the long and normal dashes, but also removes all the numbers. [\-] removes the normal dash with no problems.

Can anyone help with the correct regex? I want to remove all type of dashes in the text, and replace them with nothing.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Ezio_
  • 593
  • 3
  • 9
  • 23
  • `$var = str_replace('—','',$string);` doesn't work. It only works with `[ ]` brackets. I tried `[\—]` but it doesn't work. `[\-]` works for normal dashes. `[\u2014]` works for long dashes, but also removes all the numbers. – Ezio_ Jul 19 '14 at 14:55
  • What about `$remove = array('—', '-'); $test='asdasd2 34 56 ——————————————-'; echo str_replace($remove, '', $test);` – Funk Forty Niner Jul 19 '14 at 15:00
  • `var str = "asdasd2 34 56 ——————————————-".replace(/[\u2014\-]/g, ""); str;` – guest271314 Jul 19 '14 at 15:05
  • `var str = "asdasd2 34 56 ——————————————-".replace(/[\u2014\-]/g, ""); str;` in my situation does not work. `[\u2014\-]` also remove all the numbers. I think something else should be added to this regex to keep the numbers. – Ezio_ Jul 19 '14 at 15:12

5 Answers5

1

The snippet below could help you

<?php

$string = "asdasd2 34 56 ——————————————-";
$string = preg_replace("/(?:(?=—|\-).)+/", "", $string);
echo $string; // asdasd2 34 56

?>

The two expressions below should also replace all kind of hypens

[\p{Pd}]+
[\x{2010}-\x{2015}|\x{002D}|\x{2212}|\x{FE58}|\x{FE63}|\x{FF0D}]+

but for some reason I'm getting an error or a string with this weird character (�). That's how I came up with the first solution.

hex494D49
  • 9,109
  • 3
  • 38
  • 47
1

See Remove a long dash from a string in JavaScript? to learn how to match (replace or remove) any dash symbol in JavaScript.

In PHP, with PCRE, you can use preg_replace:

$result = preg_replace('~[-\x{058A}\x{05BE}\x{1400}\x{1806}\x{2010}-\x{2015}\x{2053}\x{207B}\x{208B}\x{2212}\x{2E17}\x{2E1A}\x{2E3A}\x{2E3B}\x{2E40}\x{2E5D}\x{301C}\x{3030}\x{30A0}\x{FE31}\x{FE32}\x{FE58}\x{FE63}\x{FF0D}\x{10EAD}]~u', '', $string);

See the PHP demo online:

$string = "Dashes: -﹣֊᐀᠆‐-–︲—﹘︱―⸺⸻⁓⸗⹀⹝〜゠⸚־−⁻₋〰";
echo "'" . preg_replace('~[-\x{058A}\x{05BE}\x{1400}\x{1806}\x{2010}-\x{2015}\x{2053}\x{207B}\x{208B}\x{2212}\x{2E17}\x{2E1A}\x{2E3A}\x{2E3B}\x{2E40}\x{2E5D}\x{301C}\x{3030}\x{30A0}\x{FE31}\x{FE32}\x{FE58}\x{FE63}\x{FF0D}\x{10EAD}]~u', '', $string) . "'";
// => 'Dashes: '

Mind the u flag that makes the PCRE engine treat the input as a sequence of Unicode code points (not bytes) (and also enables PCRE_UCP flag).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Note that it is not enough to use `\p{Pd}` to match all types of dashes. Here is a [PHP demo](https://3v4l.org/Mp6TC) showing that some artifacts remain after the replacement. – Wiktor Stribiżew Dec 16 '21 at 08:25
0

(assuming we're talking php, since your regex works in js)

try this: [\p{Pd}]+ OR [\—\-]+

here \p{Pd} matches any kind of hyphen or dash. See HERE

input: asdasd2 34 56 —————---—————————-------------

output: asdasd2 34 56

sunbabaphu
  • 1,473
  • 1
  • 10
  • 15
  • `\p{Pd}` was my first guess/try but I gave up after I've got a weird string like this `asdasd2 34 56 ��������������` \u or \\u isn't supported - you can use \x{2014} instead but however, I couldn't get hyphens removed using it – hex494D49 Jul 19 '14 at 16:10
  • try printing the string ***as it is*** and see if the characters show up fine. – sunbabaphu Jul 19 '14 at 16:16
  • Well, they don't show up as expected. As I said before :) – hex494D49 Jul 19 '14 at 16:24
  • Encoding is already set to UTF-8. Maybe I should get a new browser or try jQuery ;)) – hex494D49 Jul 19 '14 at 16:38
0

You can also use the character itself instead of their HexCode.

'asdasd2 -34 56 ——————————————-'.replace(/[—-]/g, "")
//output "asdasd2 34 56 "
Harpreet Singh
  • 2,651
  • 21
  • 31
0

To mark all Long dashes and normal dashes in php:

[\x{2014}-]+

The Problem is PHP does not Support \uFFFF to match Unicode code Points.

Andie2302
  • 4,825
  • 4
  • 24
  • 43