-3

I'm new to regular expressions, help me extract the necessary information from the text:

salespackquantity=1&itemCode=3760041","quantity_box_sales_uom"
&salespackquantity=1&itemCode=2313441","quantity_box

I need to take the numbers 3760041 and 2313441 respectively. What should a regular expression look like?

haylem
  • 22,460
  • 3
  • 67
  • 96

2 Answers2

1

If we're dealing with just line-based data as you show it could be as easy as:

.*itemCode=([0-9]+).*

Which is brutal but would do the work. You'd extract the first matching group.

Although your example seems inconsistent and truncated, so this may vary. Please provide more details if there are other conditions.

Example

>>> import re
>>> oneline = "salespackquantity=1&itemCode=3760041\",\"quantity_box_sales_uom\""
>>> match = re.search('.*itemCode=([0-9]+).*', oneline)
>>> match.group(0)
'salespackquantity=1&itemCode=3760041","quantity_box_sales_uom"'
>>> match.group(1)
'3760041'

Do you really need regex?

Arguably, a regex seems an easy way to get what you want here, but it might be grossly inefficient, depending on your use case and input data.

Several other strategies might be easier:

  • remove unnecessary data first,
  • use a proper parser for your specific content (here this looks like a mix of a CSV and URL query strings),
  • don't even bother and cut on appropriate boundaries, if the format is fixed.

Regex are powerful, and can be overly powerful for simple scenarios. Totally fair if it's to run a one-off data extraction script, though, or if the cost/benefit analysis of the development effort is worth it.

haylem
  • 22,460
  • 3
  • 67
  • 96
  • your example selects the entire text, and I only need these numbers (which are between ItemCode= and ") –  Oct 15 '21 at 15:22
  • 1
    @user461101: Nope. It depends how you use my example. Note that I said to extract the first matching group, which would be the numerical part. – haylem Oct 15 '21 at 15:28
  • @user461101: I've adjusted my answer to show you how it's done, with a live example from the python REPL. – haylem Oct 15 '21 at 15:33
  • doesn't work in my case. here is my expression itemCode=([0-9]+). , how to modify it so that only numbers are selected, because now the whole string is selected (itemCode=3760041") –  Oct 15 '21 at 15:56
  • @user461101: if you don't show your code, I cannot help you. I gave you an actual working example, directly from a Python 3 REPL. I don't know your case, as I don't know your exact data, your code, and the way you run it. – haylem Oct 15 '21 at 16:01
0
a = "example is the int and string 223576"
ext = []
b = "1234567890"
for i in a:
    if i in b:
        ext.append(i)
print(ext)
Nimantha
  • 6,405
  • 6
  • 28
  • 69
Ehsan Rahi
  • 61
  • 7
  • 1
    Hmm, that's cute and all, but doesn't quite work. 1/ You assume that positions in A and B are identical (they're not). 2/And would also pick up non-sequential characters in the string (as in. "something2then234andthenagain76"). – haylem Oct 15 '21 at 15:52
  • It will work and pick as string sequence, not like alphanumberic index. May be somebody have another idea. – Ehsan Rahi Oct 15 '21 at 16:13
  • Please don't post only code as answer, but also provide an explanation what your code does and how it solves the problem of the question. Answers with an explanation are usually more helpful and of better quality, and are more likely to attract upvotes. – Mark Rotteveel Nov 07 '21 at 10:40