-1

I am using following regex for following kind of text below

Regex: [\s](rs|price)?([\s.]*\d+[\s\d.]*)(pkg|k|(?:la(?:c|kh|k)|crore|cr)s?|l)[\s.]

Text:

65 lac this is 55 lac. and more price 100 lac. randome text to test price45 crs. and 65 cr and to test its matching rs 3244 k like rs 55k. to its matchibg 65 cr. but not 65 cr als not matching price 123 lac more of it 55 crs.

Its not matching all of the price mentioned above but only few. I am matching price which have spaces before and after of prices regex full match.

I have added [\s.] at the end to match prices which also ends with . andhave spaces after it. For e.g like 55 crs. or 24 lac. Similarly I have add '[\s]' to match only those price which have space before matched price.

Output:

https://regex101.com/r/iHamwk/1/

Example output 2: https://regex101.com/r/h8NLhr/5

Example Output 3: https://regex101.com/r/h8NLhr/8

How should I modify above regex ?

Also how can extract matched prices only excluding spaces before and after it?

Thanks.

iamabhaykmr
  • 1,803
  • 3
  • 24
  • 49
  • Clarify what you want to match from that string. – Paolo Sep 01 '18 at 13:30
  • Added output example link 2. Text describe in the link about what I want to match from the text – iamabhaykmr Sep 01 '18 at 15:15
  • 1
    Try [this](https://regex101.com/r/h8NLhr/4) – Paolo Sep 01 '18 at 15:18
  • Thanks @UnbearableLightness . But I don't want to match price which are not separated by spaces before and after . Also I want to include . at the in the match list of string. For example https://regex101.com/r/h8NLhr/5 – iamabhaykmr Sep 01 '18 at 15:43
  • Did you open the correct link? --> https://regex101.com/r/h8NLhr/6 – Paolo Sep 01 '18 at 15:43
  • Its matching only price which have . at the end only. I want to match price which don't have . at the end as well. For example https://regex101.com/r/h8NLhr/8 – iamabhaykmr Sep 01 '18 at 15:47
  • Use: https://regex101.com/r/h8NLhr/9 – Paolo Sep 01 '18 at 15:48
  • the smaliest modification to mach all prices is add word bounderies instead of \s at the beginning: `\b(rs|price)?([\s.]*\d+[\s\d.]*)(pkg|k|(?:la(?:c|kh|k)|crore|cr)s?|l)[\s.]` – SL5net Sep 01 '18 at 15:52

1 Answers1

2

If you want to match the prices you might use an alternation to match the different formats. To make sure that the leading digits and the values in the alternation are not part of a longer match you could use a word boundary \b. To also match an optional dot you could add \.?

\b\d+\s*(?:lac|crs?|k)\b\.?

Regex demo

That would match:

  • \b Word boundary
  • \d+ Match one or more digits
  • \s* Match zero or more times a whitespace character (or use [ ]* to match zero or more times a whitespace. The square brackets are not are not necessary but those are only for readability)
  • (?:lac|crs?|k) Alternation that matches either lac, cr, crs or k
  • \b Word boundary
  • \.? Match an optional dot
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • It will match `\s*` also. I guess OP doesn't need all those spaces :) – JohnyL Sep 01 '18 at 14:12
  • @JohnyL Do you mean using an optional whitespace? [`\d+[ ]?(?:lac|crs?|k)`](https://regex101.com/r/h8NLhr/1) – The fourth bird Sep 01 '18 at 14:16
  • Yes, I mean optional whitespace – JohnyL Sep 01 '18 at 14:17
  • That would also work for the example data. But I noticed that in some parts of the text (not between the prices to match though) there are multiple consecutive whitespaces. Just in case this text is part of a larger text where that does occur between the prices I used `\s*`. – The fourth bird Sep 01 '18 at 14:27
  • Thank @The fourth bird . There can be multiple white while spaces before and after . It should not match any price in this example. https://regex101.com/r/h8NLhr/2 – iamabhaykmr Sep 01 '18 at 15:09
  • @AbhayKumar You could use word boundaries `\b` and match an optional dot `\.?` as in the regex of @UnbearableLightness. I have updated my answer. – The fourth bird Sep 01 '18 at 16:18