how to find the matching pattern for an input list and then replace the found pattern with the proper pattern conversion using python

Question

note that the final two numbers of this pattern for example FBXASC048 are ment to be ascii code for numbers (0-9)

input example list ['FBXASC048009Car', 'FBXASC053002Toy', 'FBXASC050004Human'] result example ['1009Car', '5002Toy', '2004Human']

what is the proper way to searches for any of these pattern in an input list

num_ascii = ['FBXASC048', 'FBXASC049', 'FBXASC050', 'FBXASC051', 'FBXASC052', 'FBXASC053', 'FBXASC054', 'FBXASC055', 'FBXASC056', 'FBXASC057']

and then replaces the pattern found with one of the items in the conv list but not randomally because each element in the pattern list equals only one element in the conv_list

conv_list = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

this is the solution in mind: it has two part

1st part--> is to find for ascii pattern[48, 49, 50, 51, 52, 53, 54, 55, 56,57] and then replace those with the proper decimal matching (0-9) so we will get new input list will be called input_modi_list that has ascii replaced with decimal 2nd part-->another process to use fixed pattern to replace using replace function which is this 'FBXASC0' new_list3

for x in input_modi_list:
    y = x.replace('FBXASC0', '')
    new_list3.append(new_string)

so new_list3 will have the combined result of the two parts mentioned above.

i don't know if there would be a simplar solution or a better one maybe using regex also note i don't have any idea on how to replace ascii with decimal for a list of items

I believe there is a mistake in your question: 'FBXASC048009Car' should be converted to '0009Car', not '1009Car', because 48 is the ASCII code for 0, not 1. — ahmadPH, Aug 21 '20 at 09:53

score 0 · Answer 1 · answered Aug 21 '20 at 08:37

This is how I would do it.

make the regex pattern by simply joining the strings with |:

>>> num_ascii = ['FBXASC048', 'FBXASC049', 'FBXASC050', 'FBXASC051', 'FBXASC052', 'FBXASC053', 'FBXASC054', 'FBXASC055', 'FBXASC056', 'FBXASC057']
>>> conv_list = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

>>> regex_pattern = '|'.join(num_ascii)
>>> regex_pattern
'FBXASC048|FBXASC049|FBXASC050|FBXASC051|FBXASC052|FBXASC053|FBXASC054|FBXASC055
|FBXASC056|FBXASC057'

make a look-up dictionary by simply zipping the two lists:

>>> conv_table = dict(zip(num_ascii, conv_list))
>>> conv_table
{'FBXASC048': '0', 'FBXASC049': '1', 'FBXASC050': '2', 'FBXASC051': '3', 'FBXASC
052': '4', 'FBXASC053': '5', 'FBXASC054': '6', 'FBXASC055': '7', 'FBXASC056': '8
', 'FBXASC057': '9'}

iterate over the data and replace the matched string with the corresponding digit:

>>> import re
>>> result = []
>>> for item in ['FBXASC048009Car', 'FBXASC053002Toy', 'FBXASC050004Human']:
...     m = re.match(regex_pattern, item)
...     matched_string = m[0]
...     digit = (conv_table[matched_string])
...     print(f'replacing {matched_string} with {digit}')
...     result.append(item.replace(matched_string, digit))
...
replacing FBXASC048 with 0
replacing FBXASC053 with 5
replacing FBXASC050 with 2
>>> result
['0009Car', '5002Toy', '2004Human']

Please note that this is a quick and simple solution that works for the sample data, but will throw an error if one of the items doesn't match the regex pattern. — mportes, Aug 21 '20 at 08:41
your solutions sounds like it solves the problem don't worry the input pattern will always be similar to the one provided but there is an issue i tried it on the current build of python and it works but in previous build 2.7.11 it doesn't work i get a TypeError for this line <<>> '_sre.SRE_Match' object has no attribute '__getitem__' i think that is something has to do with re.match() function how can we solve this if you have any idea @myrmica — RedBeard, Aug 21 '20 at 11:30
thank you for trying to solve the problem much much thank you — RedBeard, Aug 21 '20 at 11:48
Yes, this feature was added in Python 3.6 ([docs](https://docs.python.org/3/library/re.html#re.Match.__getitem__)). In older version, use `m.group(0)` as an equivalent for `m[0]`. — mportes, Aug 22 '20 at 06:14

ahmadPH · Accepted Answer · 2020-08-21T12:51:26.970

I think this should do the trick:

import re

input_list = ['FBXASC048009Car', 'FBXASC053002Toy', 'FBXASC050004Human']

pattern = re.compile('FBXASC(\d{3,3})')
def decode(match):
    return chr(int(match.group(1)))
result = [re.sub(pattern, decode, item) for item in input_list]

print(result)

Now, there is some explanation due:

1- the pattern object is a regular expression that will match any part of a string that starts with 'FBXASC' and ends with 3 digits (0-9). (the \d means digit, and {3,3} means that it should occur at least 3, and at most 3 times, i.e. exactly 3 times). Also, the parenthesis around \d{3,3} means that the three digits matched will be stored for later use (explained in the next part).

2- The decode function receives a match object, uses .group(1) to extract the first matched group (which in our case are the three digits matched by \d{3,3}), then uses the int function to parse the string into an integer (for example, convert '048' to 48), and finally uses the chr function to find which character has that ASCII-code. (for example chr(48) will return '0', and chr(65) will return 'A')

3- The final part applies the re.sub function to all elements of list which will replace each occurrence of the pattern you described (FBXASC048[3-digits]) with it's corresponding ASCII character.

You can see that this solution is not limited only to your specific examples. Any number can be used as long as it has a corresponding ASCII character recognized by the chr function.

But, if you do want to limit it just to the 48-57 range, you can simply modify the decode function:

def decode(match):
    ascii_code = int(match.group(1))
    if ascii_code >= 48 and ascii_code <= 57:
        return chr(ascii_code)
    else:
        return match.group(0) # returns the entire string - no modification

your solution is good 9/10 but it has one problem it breaks the rule that only what is in the pattern list should be converted and other ascii numbers above or less must be neglected how do we limit it so that it take ascii numbers between 48 to 57 ? that is the key thing what you called a limitation is very important case it would have been the ultimate solution if we can limite the ascii numbers taken into account between 48 to 57 even though thank you very much for taking the time to try to solve the problem much much thank you — RedBeard, Aug 21 '20 at 11:26
@RedBeard I edited my code to also include the case where you only want the 48-57 range to be changed. It won't touch anything outside of that range. (look at the bottom of the answer) — ahmadPH, Aug 21 '20 at 11:50

how to find the matching pattern for an input list and then replace the found pattern with the proper pattern conversion using python

2 Answers2