3

I'm looking for a regular expression for use with PHP's preg_match_all() function, which will give me all of the px values from a CSS file.

For example, if the css below is used, then the expected result would be an array of:

array ( "11px", "0.45px", "11.0005px", "1.1px", "888.888px" )

The $pattern string is what I have so far -- it doesn't appear to work, however.

The logic I was trying to use is: the number before the decimal can be up to 4 digits, the decimal symbol is optional, and the number after the decimal is optional, up to 4 digits, followed by "px".

$pattern = "/([0-9]{1,4}\.*[0-9]{1,4}*px)/";

$css = '
.some_class {
    font-size: 11px;
    margin-left: 0.45px;
    margin-top:11.0005px;
    border: 1.1px solid blue;
}
.another_class {
    background: rgba(0, 0, 0, 0.2);
    width: 100%;
    color: #012345;
    z-index: 12;
    font-size: calc(100% + 888.888px);
}
';
preg_match_all($pattern, $css, $matches, PREG_PATTERN_ORDER);
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
Darren Gates
  • 473
  • 2
  • 18

2 Answers2

2
  • a capture group is unnecessary and will only slow down the regex engine.
  • \b will ensure that only full qualifying numbers are matched
  • \d is a shorter syntax for [0-9].
  • to match zero or one of something use ?. * means zero or more.
  • since not all pixel values are floats (11px), you can make the decimal place digits and trailing one to four digits optional by wrapping in a non-capturing group and adding a zero or one quantifier (?).
  • your pattern was breaking because you used two consecutive quantifiers: {1,4}* which is like saying "match 1 to 4 0 or more occurrences". The regex engine was like: "huh?"

Code: (Demo) (Pattern Demo)

$css = '
.some_class {
    font-size: 11px;
    margin-left: 0.45px;
    margin-top:11.0005px;
    border: 1.1px solid blue;
}
.another_class {
    background: rgba(0, 0, 0, 0.2);
    width: 100%;
    color: #012345;
    z-index: 12;
    font-size: calc(100% + 888.888px);
}';
$pattern = "/\b\d{1,4}(?:\.\d{1,4})?px/";

var_export(preg_match_all($pattern, $css, $matches) ? $matches[0] : 'fail');

Output:

array (
  0 => '11px',
  1 => '0.45px',
  2 => '11.0005px',
  3 => '1.1px',
  4 => '888.888px',
)

Patterns with greater validation:

  • Checks that the 1-4 digit number is preceded by a colon or a space (\K restarts the fullstring match):

    /[: ]\K\d{1,4}(?:\.\d{1,4})?px/
    
  • Checks that the 1-4 digit number is not preceded by a digit:

    /\D\K\d{1,4}(?:\.\d{1,4})?px/
    

Your sample input uses zeros before decimal points. If the zeros are optional, my pattern will need adjusting. These will allow floats without a leading digit while requiring that dot is trailed by a digit.

  1. /\D\K(?:\d{1,4}|\d{0,4}\.\d{1,4})px/

  2. /\D\K\d{0,4}(?:\.(?=\d))?\d{1,4}px/

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • hi, I think that your answer is fantastic! Just one question: you noted that if zeros are optional, then the pattern will need adjusting. It turns out that there are some styles like: font-size: .5px Can you clarify how this might be adjusted to take this into account? Thanks!! – Darren Gates Mar 07 '18 at 07:57
  • Just a sec. I'll adjust. – mickmackusa Mar 07 '18 at 07:57
  • I'm guessing that the first "{1,4}" would be changed to "{0,4}", true? – Darren Gates Mar 07 '18 at 07:57
  • that will work, but then it will also match `px` (with no leading digits) This is a fringe case though. Perhaps there is no consequence for your project. – mickmackusa Mar 07 '18 at 07:59
  • This one seems to work the best for me: /\d{0,4}\.?\d{0,6}px/ – Darren Gates Mar 07 '18 at 08:14
  • 1
    I could also imagine space between digit and `px` but that isn't mentioned by OP. Very well explained and extensive answer. – bobble bubble Mar 07 '18 at 21:37
2

Just made correction from your pattern,

$pattern = "~([0-9]{1,4})px|([0-9]{1,4}?\.[0-9]{1,4})px~";

$css = '
.some_class {
    font-size: 11px;
    margin-left: 0.45px;
    margin-top:11.0005px;
    border: 1.1px solid blue;
}
.another_class {
    background: rgba(0, 0, 0, 0.2);
    width: 100%;
    color: #012345;
    z-index: 12;
    font-size: calc(100% + 888.888px);
}
';
preg_match_all($pattern, $css, $matches, PREG_PATTERN_ORDER);

Data in $matches:

array (
  0 => 
  array (
    0 => '11px',
    1 => '0.45px',
    2 => '11.0005px',
    3 => '1.1px',
    4 => '888.888px',
  ),
  1 => 
  array (
    0 => '11',
    1 => '',
    2 => '',
    3 => '',
    4 => '',
  ),
  2 => 
  array (
    0 => '',
    1 => '0.45',
    2 => '11.0005',
    3 => '1.1',
    4 => '888.888',
  ),

)

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
Naga
  • 2,190
  • 3
  • 16
  • 21
  • @mickmackusa how / what did you find incorrect/inaccurate ? – Naga Mar 07 '18 at 04:22
  • @mickmackusa I agree with with you and updated the answer. – Naga Mar 07 '18 at 04:49
  • You are using too many capture groups which generate unnecessary output array bloat and cost more steps. Your pattern will match the last four digits of a 5-or-more-digit number. The lazy `?` after `{1,4}` doesn't make a difference. Pipes cost steps and your pattern isn't very DRY. (leaving my computer now) – mickmackusa Mar 07 '18 at 04:50
  • I can't see what could be more correct in this answer, further it lacks any explanation. – bobble bubble Mar 07 '18 at 21:39