0

Goal:

I want a CSV file as my result. I want to change the space char to a comma on each line of data. BUT, I also need the data for the 3rd field (Description) to remain as is with original space chars. Each line of data is terminated with a newline char.

Flipping spaces to commas on every line is easy with regex. But how do 'bookend' the string of text which will then become the 3rd/Description field and preserve its spaces? Currently I manually change commas back to spaces just in that text string. Painful.

Example of Final result needed (including column names)

Transaction Date,Posting Date,Description,Reference Number,Account Number,Amount

12/23,12/24,GOOGLE*DOMAINS SUPPORT.GOOGLCA,7811,8550,12.00

My sample data:

12/23 12/24 GOOGLE*DOMAINS SUPPORT.GOOGLCA 7811 8550 12.00

01/02 01/04 CREPEVINE - OAKLAND OAKLAND 234567 CA 1087 8220 16.32

01/06 01/07 AB* ABEBOOKS.CO J6YDBX HTTPSWWW.ABEBWA 6289 85332 6.98

01/20 01/21 SQ *BAGEL STREET CAFE Oakland CA 2313 44444 24.43

A few of My Regex attempts

  • This cmd changes spaces to commas over all 5 lines by combining it with Join cmd. Easy.

And just fyi: "\n" would not work for some reason so I do the <Ctrl+Enter> keys to inject a newline char, ie the two lines. For now it orks fine.

=regexreplace(join("

",A1:A5)," ",",")


RESULT:
12/23,12/24,GOOGLE*DOMAINS,SUPPORT.GOOGLCA,7811,8550,12.00
...
01/02,01/04,CREPEVINE,-,OAKLAND,OAKLAND,CA,1087,8550,16.32
...
  • Here is my poor attemp to bookend the description field, then flip commas back to spaces, but no luck either.
=REGEXREPLACE(A1,"(,[A-Z]+[A-Z],)"," ")

How do I craft a regex to do this? cheers, Damon

Using Regex101 to reverse learn how you did it

  • Also, I am doing this in Google sheets. I've read a bit about 'lookaheads' in regex as a part of this solution. But I also read the Sheets does not support lookaheads. If true, then maybe post solutions that avoid this. I'm open to all advice cheers. – ManniStudio Feb 10 '23 at 21:34

1 Answers1

1

Can you try:

=index(if(len(A:A),regexreplace(A:A,"(?U)(.*) (.*) (.*) (\d[^A-Za-z]*) (\d.*) (\d.*)","$1,$2,$3,$4,$5,$6"),))

enter image description here

rockinfreakshow
  • 15,077
  • 3
  • 12
  • 19
  • I'm beside myself :-). I just read about "re.U" for the (?U). Have no clue what this means. The sample data is from a PDF bank statement (b/c they don't supply CSV files, go figure). And I need to get all 12 months of transactions into CSV so I can Import into Quickbooks. This will save me Time & money in not having to pay a bookkeeper for data entry. You're amazing. – ManniStudio Feb 10 '23 at 23:16
  • To try & understand your logic & regex, I ran it thru regex101 & included a screenshot. Use the link above or here [link](https://i.stack.imgur.com/a2RnF.png) – ManniStudio Feb 11 '23 at 02:10
  • What I don't get is groups 2 & 3. Did you break the string into 2 parts? Also, when you write out the groups (ie $1...$6) how does $2 & $3 not show a comma(,) between them even though its there in the regex field? – ManniStudio Feb 11 '23 at 04:51
  • the first 2 capturing groups `(.*) (.*)` capture `Transaction Date,Posting Date`; the 3rd one `(.*)` for `description` is warranted to extend & capture till the point it encounters the next group (`reference number`) which starts with a numbers & no alphabets detected beyond that `(\d[^A-Za-z]*)` – rockinfreakshow Feb 11 '23 at 11:11
  • Ah I follow now. Thanks so much for explaining. – ManniStudio Feb 11 '23 at 18:25
  • -Need a little help again for slightly diff use case. This time I want ONLY the 3rd (Description) field, and ignore all else. Also I now found numerics in field so need to handle that also Sample Data: ``` 12/23 12/24 GOOGLE*DOMAINS SUPPORT.GOOGLCA 7811 8550 12.00 01/04 01/06 KAISER CPP PHARM 256 8882186245 CA 2874 8550 20.00 ``` ** Wanted result ** ``` GOOGLE*DOMAINS SUPPORT.GOOGLCA KAISER CPP PHARM 256 8882186245 CA ``` ** My attempt ** ``` =index(if(len(B5:B16),regexreplace(B5:B16,"(?U)(.*) (.*) (.*?[A-Za-z0-9]{2}) (\d.*) (\d.*)","$3"),)) ``` – ManniStudio Feb 14 '23 at 19:45
  • https://i.imgur.com/IBec6Rv.png please try: `=index(if(len(A:A),regexextract(A:A,".*? .*? (.*) \d+ \d+ \d+"),))` – rockinfreakshow Feb 14 '23 at 19:59
  • You're amazing. You make it look so simple. I've tried so many extract & replaces. Drinks on me!! Thanks, Damon – ManniStudio Feb 14 '23 at 22:04