0

I am using the following long formula for extracting specific fields from a delimited string. The formula works fine, but is there a more efficient way, noting that it must support very large strings with 100s of delimited fields?

=IFERROR(MID($A1,FIND("®",SUBSTITUTE(A1,char(187),B1))+1,FIND("®",SUBSTITUTE(A1,CHAR(187),"®",B1+1))-(FIND("®",SUBSTITUTE(A1,CHAR(187),"®",B1))+1)),"")

A1 contains the string;
char(187) is the delimiter;
B1 contains the number of the field that I would like to extract

Any help would be greatly appreciated

a1 = »44130»Yes»44105»»Design & implement a workflow system for the end-to-end medical assessment process to improve customer experience, data accuracy & operational efficiency»

B1 = 3

Answer = 44105 (as text )

Data in the string could include anything able to be typed on a standard keyboard "®" has been selected in the substitute formula as it is not a standard character

Windows Excel 2016 (but would like it to be as broadly compatible as possible)

Katoomba
  • 9
  • 2
  • 2
    So, what is your data in `A1` cell then what output do you want? Post your sample data to the post. – Harun24hr Oct 29 '20 at 08:04
  • How large is "very large"? You may run into cell content limits and be better off doing this in Power Query or VBA, depending on the size. Need to know about your data to advise better. – Ron Rosenfeld Oct 29 '20 at 09:49
  • Also, what version of Excel? And is it Windows or MAC? – Ron Rosenfeld Oct 29 '20 at 10:01
  • Also, what is the pont of the `®` in your code? There must be something more about your data than just "*char(187) is the delimiter*". Entering a string with `char(187)` as the delimiter does not return anything. – Ron Rosenfeld Oct 29 '20 at 10:27
  • Thank you for your help. I have added in answers to your questions in edits to the original question – Katoomba Nov 05 '20 at 00:22

1 Answers1

0
  • If you have Windows Excel 2010+ or Office 365,
  • and if your string, with the substitutions, will be less than 32,768 characters,
  • and if "char(187) is the delimiter" completely defines the relevant characteristics of your string,
  • then you can use the FILTERXML function.

eg:

=FILTERXML("<t><s>" & SUBSTITUTE($A1,CHAR(187),"</s><s>") & "</s></t>","//s[" & $B1 &"]")

However, if there are numeric values, this may remove leading zero's.

Whether this is more efficient than your formula likely depends on your definition of efficient

Edit Note that if your string happens to include reserved characters, i.e. HTML entities you will need to replace them (using nested SUBSTITUTEs with the entity name. (They will come out as the proper character in the output string). If the link doesn't work, do a search for that term.

A common entity might be the ampersand &.

To deal with that, you could use this formula:

=FILTERXML("<t><s>" & SUBSTITUTE(SUBSTITUTE($A1,"&","&amp;"),CHAR(187),"</s><s>") & "</s></t>","//s["&B1&"]")

Also note that empty nodes will return a #Value! error.

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
  • Hi Ron, Thank you very much for your response. I have tried your equation and it certainly looks like it is a lot simpler and uses fewer formula steps than mine. It works for some of my content but fails when there is an ampersand in the data. ie if A1 = »44130»Yes»44105»»Design & implement a workflow system for the end-to-end medical assessment process to improve customer experience, data accuracy & operational efficiency» – Katoomba Nov 04 '20 at 23:54
  • Hi Ron. Please also note that the largest string I have so far is 6000 characters with 1500 delimiters (ie char(187) - "»") – Katoomba Nov 05 '20 at 00:17
  • @Katoomba There are certain characters that need to be replaced since they are also used as part of html code. `&` is one of them. I'll edit my question to post the relevant code. – Ron Rosenfeld Nov 05 '20 at 01:18
  • Hi Ron - Thank you for the updated formula. I have tried it out on a couple of my longer strings and it seems to work so far, but I still have more testing to do. Are there any other limitations. eg I noticed that the FILTERXML function is not available in Excel for the web? This is not a problem for my master workbook, but I have a number (>50) of linked sheets that are still located on the network - but could potentially be remotely accessed by users on a smartphone or other device via Web Excel? ...and what other special characters may potentially trip up the formula? – Katoomba Nov 06 '20 at 07:10
  • @Katoomba Other than what I've already written in my response (which included it's availability in **Windows** excel), the returned values will be interpreted by Excel as if you type the data into a cell. So numeric values will drop leading zero's, and constructions that appear as dates or times will be parsed as such. – Ron Rosenfeld Nov 06 '20 at 11:39
  • The filterxml formula does work, but I am still looking for a solution that will work on all Excel versions (eg web excel). I have also found the following formula: TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",LEN(A1))), (N-1)*LEN(A1)+1, LEN(A1))). but it seems to fail on large strings > 256 chars – Katoomba Dec 07 '20 at 03:47