Split text using any suggested method

Question

I have a plain text like this:

Cart ID: A3N42M / Copy: A3N42P PO: 5000021337 Invoice: 3110021337
Cart ID: A3N3ZW / Copy: A3N3ZX/ PO: 5000021335 Invoice: 3110021335 
Cart ID: A3N3ZL / Copy: A3N3ZM PO: 5000021336 Invoice: 3110021336 
Original: A3N444 / Copy: A3N445 PO: 5000021340 Invoice: 3130021340
Original: A3N44C / Copy: A3N44D PO: 5000021341 Invoice: 3130021341
Original: A3N44G / Copy: A3N44H PO: 5000021342 Invoice: 3130021342
Cart ID: A3N3ZZ / Copy: A3N428 PO: A3N3ZZ01
Cart ID: A3N3ZQ / Copy: A3N3ZV PO: A3N3ZQ01
Cart ID: A3N336 / Copy: A3N337 PO: A3N33601
E3M49D / E3M49Q - PO: 4620028049
E3M49N / E3M49X
E3M49P / E3M49Y

And I need to split the cart IDs from that text, and end up in something like this:

A3N42M
A3N42P
A3N3ZW
...
E3M49N
...

These IDs are always 6-digit length, and always starts with A, E or P (AXXXXX, EXXXXX, PXXXXX, etc...).

Is there any way (using any script or any program language) that I can achieve this?

This plain text is currently on a spreadsheet table, and I need to separate these IDs to use in a SQL query for later, thanks!

Almost every language could perform this if it had too, what language do you know or would you prefer to work in? — tmthydvnprt, Jul 26 '16 at 14:33
If you use python look at [`split`](https://docs.python.org/2/library/stdtypes.html#str.split) or [`regex`](https://docs.python.org/2/library/re.html#re.split) — tmthydvnprt, Jul 26 '16 at 14:35
Actually I just have a very basic program language knowledge, little bit of Java ... but not enough to try to achieve this — ryoishikawa74, Jul 26 '16 at 14:36
You could also use a text editor with regex find and replace to perform splits. — tmthydvnprt, Jul 26 '16 at 14:38
I use Notepad++, I saw some tips on the internet on how to do it bookmarking lines etc, but I still did not manage to make it work, but I'll use the python code below that a fellow posted, thanks a lot buddy :-) — ryoishikawa74, Jul 26 '16 at 15:38
When you say that the plain text is currently in a spreadsheet, do you mean that all of that text is in one cell? Or is it spanned across multiple cells. — dustinroepsch, Jul 26 '16 at 15:38
@DustinRyan-Roepsch It's all in the same cell ... I could do a simple `=SPLIT` and use space as delimiter, but it would also give me work in order to delete the cells that does not have the content I want. — ryoishikawa74, Jul 26 '16 at 19:35

score 1 · Accepted Answer · edited May 23 '17 at 12:32

Regex Data-Munging

You can perform this via regex substitutions (or find and replace). I happened to do this in python, but you could use the same regex patterns ('^[^/]*?: ', '/.*') in any language or text editor that supports them.

Regex Explanation

'^[^/]*?: ' - start and beginning of string (first ^), match multiple non-\ characters in a non greedy way ([^/]*?]), a :, then a .
'/.*' - match all /, then multiple any character (.)

Processing Example (in Python)

import re

text = '''
Cart ID: A3N42M / Copy: A3N42P PO: 5000021337 Invoice: 3110021337
Cart ID: A3N3ZW / Copy: A3N3ZX/ PO: 5000021335 Invoice: 3110021335 
Cart ID: A3N3ZL / Copy: A3N3ZM PO: 5000021336 Invoice: 3110021336 
Original: A3N444 / Copy: A3N445 PO: 5000021340 Invoice: 3130021340
Original: A3N44C / Copy: A3N44D PO: 5000021341 Invoice: 3130021341
Original: A3N44G / Copy: A3N44H PO: 5000021342 Invoice: 3130021342
Cart ID: A3N3ZZ / Copy: A3N428 PO: A3N3ZZ01
Cart ID: A3N3ZQ / Copy: A3N3ZV PO: A3N3ZQ01
Cart ID: A3N336 / Copy: A3N337 PO: A3N33601
E3M49D / E3M49Q - PO: 4620028049
E3M49N / E3M49X
E3M49P / E3M49Y
'''

text = re.sub('^[^/]*?: ([]*?)', '', text, flags=re.MULTILINE)
text = re.sub('/.*', '', text)

print text

A3N42M 
A3N3ZW 
A3N3ZL 
A3N444 
A3N44C 
A3N44G 
A3N3ZZ 
A3N3ZQ 
A3N336 
E3M49D 
E3M49N 
E3M49P

EDIT

Updated Regex Explanation

Updated regex per author's request.

match any character (.*?) (non-greedy), followed by a capture group of ((...)) of a character class ([AEP]) followed by at least one character class ([0-9]+) followed by four word characters (\w{4}), followed by another capture group of any character ((.*))
All of this is matched and replaced with the capture group variables with a newline in between (\1\n\2), essentially splitting rows where IDs occur twice
Repeat steps with only the first capture group to handle the newlines containing the second ID

Updated Processing

text = re.sub(r'.*?([AEP][0-9]+\w{4})(.*)', r'\1\n\2', text, flags=re.MULTILINE)
text = re.sub(r'.*?([AEP][0-9]+\w{4}).*', r'\1', text, flags=re.MULTILINE)

print text

A3N42M
A3N42P
A3N3ZW
A3N3ZX
A3N3ZL
A3N3ZM
A3N444
A3N445
A3N44C
A3N44D
A3N44G
A3N44H
A3N3ZZ
A3N428
A3N3ZQ
A3N3ZV
A3N336
A3N337
E3M49D
E3M49Q
E3M49N
E3M49X
E3M49P
E3M49Y

Hello, I installed Python and tried to run the .py file using the command prompt, but I got this: `SyntaxError: Missing parentheses in call to 'print'`, am I doing anything wrong? Sorry never played with Python before, I just ran: `python file.py` in the command prompt. — ryoishikawa74, Jul 26 '16 at 15:50
Thanks! it almost did the trick, just some IDs missing after the `/` in the text, as you can see in this [Image](http://imgur.com/a/wSWLP) — ryoishikawa74, Jul 26 '16 at 16:00
Oh assumed you only wanted the first IDs. You will need to change your `regex` patterns to handle that. But now you have a recipe to start with. Have a look at [regex101.com](http://www.regex101.com) to play with developing useful patterns. — tmthydvnprt, Jul 26 '16 at 16:34
Thanks for the link, I did a little research and I could come up with a regex that found all my cart IDs in the list: `(a|e|p)[0-9]{1,}\w{4}` -- [IMAGE](http://imgur.com/a/WcImY) -- Now, is it possible in that python code to use this regex and delete everything that does not match it? — ryoishikawa74, Jul 26 '16 at 17:28
Remember to accept the answer and up vote if you approve, [we both get rep! (you get +2 reputation for accepting!)](http://meta.stackexchange.com/a/5235/292533) — tmthydvnprt, Jul 28 '16 at 04:36
Thank you very much! Worked like a charm, I accepted your post as the answer, I just cannot up vote due to my reputation, sorry for the delay! :-) — ryoishikawa74, Aug 01 '16 at 19:00