1

I have over 1,000 serial codes I need to enter into a database but they have to be completely numerical for conversion identification purposes. They all look similar to this format but contain different characters/numbers:

d47a3c06-r188-4203-n838-fefd32082fd9

I've been trying to figure out how to use regex to remove all letters and dashes but I'm now at a loss.

I need to know how to turn this: d47a3c06-a188-4203-b838-fefd32082fc9

Into this: 473061884203838320829

Using regex. Then possibly trim it down to a 5 digit number using the first 5 numbers.

Thank you so much!

Zac
  • 813
  • 10
  • 22

3 Answers3

0

Depending on your programming language, you can easily filter digits and join them afterwards.
Here's an example in Python with the help of the re module and list comprehensions:

import re

serials = ['d47a3c06-r188-4203-n838-fefd32082fd9', 'e48a3c08-r199-4203-n838-fefd32082fd0']
corrected_serials = []
for serial in serials:
    numbers = re.findall(r'\d+', serial)
    corrected_serials.append(''.join(numbers))

corrected_abbreviated = [item[0:5] for item in corrected_serials]

print corrected_serials
print corrected_abbreviated

# output
# ['473061884203838320829', '483081994203838320820']
# ['47306', '48308']

See a demo on ideone.com

Jan
  • 42,290
  • 8
  • 54
  • 79
  • So far.. Thank you all for your help, I am still struggling with this. I'm using the Regex tool here: http://regexr.com/v1/ and using the expression : [a-z/-] I am so far able to get all the non-numerical characters removed, I am still stumped on how to trim to 5 characters. Eventually this regex expression will be going into a Drupal site that uses feeds to import data from a CSV file, inside the CSV file there are the serials which will be parsed using Drupal feeds CSV. I am using the listed regex tool listed above to make sure I get the right expression. – Jeremy Womack Apr 18 '16 at 21:29
0

Using a first regex with s (search and replace) command, all non digit can be removed s/[^0-9]//g

The result is used with a second regex with s command, only the digits before the fith one are printed "/^\(.\{5\}\).*$/\1/.

Use these with bash shell and the sed command.

If the serial numbers are in serials.txt file:

cat serials.txt
d47a3c06-r188-4203-n838-fefd32082fd9

sed -e "s/[^0-9]//g" -e "s/^\(.\{5\}\).*$/\1/" serials.txt
47306

Using printf:

printf d47a3c06-r188-4203-n838-fefd32082fd9 | sed -e "s/[^0-9]//g" -e "s/^\(.\{5\}\).*$/\1/"
47306
Jay jargot
  • 2,745
  • 1
  • 11
  • 14
  • The serials are in a CSV file which are then imported intro Drupal using the Feeds module. I've come close to what I need with a bit of your help by using [a-z/-] which removed all letters and dashes but I'm still clueless on how to trim the results. – Jeremy Womack Apr 18 '16 at 21:50
0

Since you are using Drupal, if what you need is an answer in PHP, then a PHP translation of the answer made by @jay-jargot is like this:

$input = "d47a3c06-r188-4203-n838-fefd32082fd9";
$str = preg_replace("/[^0-9]/", "", $input);
$str = substr($str, 0, 5);
echo $str, "\n";        ## output: 47306
Mogsdad
  • 44,709
  • 21
  • 151
  • 275
Ray
  • 36
  • 4