-4

I need to match only the numbers after TOTAL SALES from this text and it has to stop at the first dot and it has to avoid any special character

the desired result is 800

Update: it works on first one but it fails if there is a backtick Regex

TOTAL SALES � EUR � 800.0013�000.001�700TOTAL PURCHASEEUR90.007�500.00250.001�444.84
MARGIN EUR710.00

I made this and can't figure it out how to continue:

(?<=TOTAL SALES)([^.]+)

It works but it matches the EUR and the special char.

Any help is appreciated! Thanks

Jacob
  • 31
  • 4
  • 2
    What programming language (or regex flavor) are you using? A generic solution for this would be `TOTAL SALES\D*\b(\d+)` ([demo](https://regex101.com/r/a2YVUq/1)) but the expected match will be captured in group #1. – 41686d6564 stands w. Palestine Aug 03 '20 at 22:24
  • @AhmedAbdelhameed I need it for uipath but i test it on regex101 – Jacob Aug 03 '20 at 22:26
  • @Jacob I haven't used UiPath before but based on a quick Google search it seems to be using the .NET regex engine. If that's the case, something like `(?<=TOTAL SALES\D*)\b\d+` should work and you wouldn't need a capturing group. See the [demo](http://regexstorm.net/tester?p=%28%3f%3c%3dTOTAL+SALES%5cD*%29%5cb%5cd%2b&i=TOTAL+SALES+%ef%bf%bd+EUR+%ef%bf%bd+800.0013%ef%bf%bd000.001%ef%bf%bd700TOTAL+PURCHASEEUR90.007%ef%bf%bd500.00250.001%ef%bf%bd444.84%0d%0aMARGIN+EUR710.00%0d%0a). – 41686d6564 stands w. Palestine Aug 03 '20 at 22:31
  • @AhmedAbdelhameed it works on a file, but i am scaning many files in a folder and extract this data from them, some files have `TOTAL SALES EUR 2’000.00` as you can notice in 2000 there is a ` special char, is there a way to take only the numbers and it has to stop at the first dot, thank you for the help ` – Jacob Aug 03 '20 at 22:35
  • If your expected match is `2’000` in this last example, then you may change the pattern to something like `(?<=TOTAL SALES\D*)\b[0-9’]`. – 41686d6564 stands w. Palestine Aug 03 '20 at 22:37
  • Does https://regex101.com/r/wM2Yvi/1 work for you? – mjrezaee Aug 03 '20 at 22:38
  • @mjrezaee i need to match only the numbers until the dot. – Jacob Aug 03 '20 at 22:41
  • @AhmedAbdelhameed its good but i need to avoid any kind of special char only the numbers i need until the first dot – Jacob Aug 03 '20 at 22:42
  • What language/tool are you using? I believe you’ll need code, not just regex, to do the job, which is beyond what pure regex can do. – Bohemian Aug 03 '20 at 22:44
  • So, in your `TOTAL SALES EUR 2’000.00` example, you want to get `2000` as matched group? I think this can't be done as single group in regex and you may need some post-replacement on the matched group to solve this problem – mjrezaee Aug 03 '20 at 22:45
  • @Bohemian uipath, but @AhmedAbdelhameed made it work on one, the only problem was the ` special char in other files 2`000 the desired result should be 2000 – Jacob Aug 03 '20 at 22:46
  • @Jacob _"its good but i need to avoid any kind of special char only the numbers"_ Both patterns that I suggested will only match digits. What's the problem? _"the desired result should be 2000"_ Well, you can't use regex to match something that isn't there. The best you can do is match `2’000` and then remove the `’` character later. – 41686d6564 stands w. Palestine Aug 03 '20 at 22:49
  • @AhmedAbdelhameed `https://regex101.com/r/h29hkj/1` its not working – Jacob Aug 03 '20 at 22:55
  • @AhmedAbdelhameed check here to please [regexstorm](http://regexstorm.net/tester?p=%28%3f%3c%3dTOTAL+SALES%5cD*%29%5cb%5cd%2b&i=TOTAL+SALES+%ef%bf%bd+EUR+%ef%bf%bd+800.0013%ef%bf%bd000.001%ef%bf%bd700TOTAL+PURCHASEEUR90.007%ef%bf%bd500.00250.001%ef%bf%bd444.84%0d%0aMARGIN+EUR710.00%0d%0a%0d%0aTOTAL+SALES+%ef%bf%bd+EUR+%ef%bf%bd+8%6000.0013%ef%bf%bd000.001%ef%bf%bd700TOTAL+PURCHASEEUR90.007%ef%bf%bd500.00250.001%ef%bf%bd444.84%0d%0aMARGIN+EUR710.00%0d%0a) – Jacob Aug 03 '20 at 22:57
  • @Jacob You keep repeating the same thing over and over. I did tell you in a previous comment to use `[0-9’]` instead of `\d` if you want to match `8’000`, right? – 41686d6564 stands w. Palestine Aug 03 '20 at 22:59
  • @AhmedAbdelhameed i don't want to take 8`000 but 8000 – Jacob Aug 03 '20 at 23:00
  • @Jacob Well, I already replied to that too. Your input string **does _not_ have** `8000` in it. How do you expect to match something that isn't there?! Either match `8’000` and then remove the extra char, or replace the extra char first before trying to match the number. – 41686d6564 stands w. Palestine Aug 03 '20 at 23:02
  • @AhmedAbdelhameed i gave 8000 as example but it will always be a number there, that is sometimes 800 sometimes 2000 and sometimes 2`000 and i have to remove any kind of char and to take only the numbers – Jacob Aug 03 '20 at 23:06
  • Its like what you had but needs a lookahead `(?<=TOTAL SALES\D*)\d+(?=\.)` –  Aug 03 '20 at 23:38
  • Perhaps match the digits in group 1 with an optional part that matches the `’` and captures the digits after that in group 2 so the full number is group 1 and group 2 `(?<=TOTAL SALES\D*)(\d+)(?:’(\d+))?` https://regex101.com/r/AQ4Mum/1 – The fourth bird Aug 04 '20 at 08:32

1 Answers1

-2

I really don't know why you are using a regular expression here, I find it neither straight forward nor pretty neat.

The string you have provided contains markers for a new segment - since the replacement mark () may not be it's actual value, you may have to adjust the char value:

        string test = "TOTAL SALES � EUR � 2'800.0013�000.001�700TOTAL PURCHASEEUR90.007�500.00250.001�444.84MARGIN EUR710.00";         
        string[] split = test.Split('\uFFFD');          
        double actualValue =  Convert.ToDouble(Regex.Replace(split[2],"[^0-9.]",""));
        Console.WriteLine(actualValue);;

Outputs: 2800.0013 for Culture en-us.

With this way, you also can work more easily with the proper datatype, which would be a double in this example.

Given that this format is static on this point, it's actually legit to rely on the index of the split array.

If you however want to ensure that you're always getting the total sales or that the mask may not be always static, you may even use a simple a substring / indexOf combo:

        var start = test.IndexOf("TOTAL SALES");
        var dotIndex = test.IndexOf('.',start);
        
        for(int i = dotIndex; i >= 0; i--)
        {
            if(test[i].Equals(' '))
            {
                start = i+1;
                break;
            }
        }
        
        
        string stringValue = test.Substring(start,dotIndex-start);
        stringValue = Regex.Replace(stringValue,"[^0-9]","");
        Console.WriteLine(Convert.ToInt32(stringValue));

outputs: 2800.

maio290
  • 6,440
  • 1
  • 21
  • 38
  • I use UiPath, not javascript. – Jacob Aug 03 '20 at 23:07
  • I only know UIPath slightly and never used it, but as far as I can recall you have the option to invoke some C#? And the code I've posted is actually C#. Solving this problem with regex may be possible for sure, but since you already have trouble writing these expressions, you aren't able to maintain these if they get more complex. Thus I'd really recommend you to find another way than using a regex here. But what you actually do or not is up to you. – maio290 Aug 03 '20 at 23:12
  • Indeed UIPath supports C#. It worked perfectly, thanks for the help :) – Jacob Aug 04 '20 at 09:18