0

I have checked all the existing questions on Stackoverflow but I couldn't find the perfect answer to it and need your help.

So basically I have multiple Strings containing different formats of URL in different ways, for eg:-

1:

<p><a href='https://abcd.com/sites/WG-ProductManagementTeam/FunctionalSpecs/Forms/AllItems.aspx?id=/sites/WG-ProductManagementTeam/FunctionalSpecs/DevDOC/Enhancements to PA Peer Checklist/PA Peer Checklist (V2.3) -v10.0.pdf&amp;parent=/sites/WG-ProductManagementTeam/FunctionalSpecs/DevDOC/Enhancements to PA Peer Checklist&amp;p=true&amp;ga=1'>WG-Product Management Team - PA Peer Checklist (V2.3) -v10.0.pdf - All Documents (sharepoint.com)</a></p>

2:

https://abcd.com/sites/WG-ProductManagementTeam/FunctionalSpecs/Forms/AllItems.aspx?id=%2Fsites%2FWG%2DProductManagementTeam%2FFunctionalSpecs%2FDevDOC%2FEnhancements%20to%20PA%20Peer%20Checklist%2FPA%20Peer%20Checklist%20%28V2%2E3%29%20%2Dv10%2E0%2Epdf&parent=%2Fsites%2FWG%2DProductManagementTeam%2FFunctionalSpecs%2FDevDOC%2FEnhancements%20to%20PA%20Peer%20Checklist&p=true&ga=1

3:

https://abcd.com/:b:/r/sites/WG-ProductManagementTeam/FunctionalSpecs/DevDOC/Enhancements%20to%20PA%20Peer%20Checklist/PA%20Peer%20Checklist%20(v2.0)%20-%20v3.0.pdf?csf=1&web=1&e=txs2Yq

I want to extract a part of URL like this:- /DevDOC/....../.pdf

as you can see above shared 3 URL strings are all different but I am not able to find the most efficient way to resolve this.

I need to do it in such a way that it works for every type of URL string even though formats are different it should extract it from any and every String in same way.

Right now I am using regex: "./FunctionalSpecs(?!.\1)(.*?)(.pdf)" and it is working for URL 2 and 3 shared above but in case of URL 1 it is returning:

/DevDOC/Enhancements to PA Peer Checklist&p=true&ga=1'>WG-Product Management Team - PA Peer Checklist (V2.3) -v10.0.pdf

which is incorrect, I wanted this:

/DevDOC/Enhancements to PA Peer Checklist/PA Peer Checklist (V2.3) -v10.0.pdf

Please help me resolve this as soon as possible as It seems so easy but I am not able to do it in an efficient way.

Also, I am trying to do it in Java.

Any help is highly appreciated. Thank you.

2 Answers2

0

You can either decode and then use:

 `/DevDOC/[^\.]+\.pdf`

Or without decoding you might want to use:

DevDoc[^\.]+pdf

I'm relying here on the existence of a period before the .pdf, as the regex should keep going until first appearance of a period. If that doesn't work you might want to use [^"]+.

  • I tried both but it is not working, it is not what I was looking for. Though thanks for your time and help. – Faizan Shaikh Sarkar Nov 10 '22 at 11:00
  • I got one that's working: (DevDOC[^=]+Enhancements.*?pdf) – Shai Vashdy Nov 10 '22 at 11:50
  • Thanks for your time and reply but this also doesnt work in all cases so what I did was is just fetched the link in href when there were multiple .pdf and atleast 1 href in string and for all the other cases my previous regex was working so used that. Thanks, I really appreciate your time and help. – Faizan Shaikh Sarkar Nov 14 '22 at 09:43
0

you can use decodeURIComponent to decode your url and then you can extract your value like below.

var url = decodeURIComponent("your encoded url string");
console.log(url.match(/DevDOC[\s\S]*\.pdf/i));
Tiwari
  • 45
  • 2
  • Not working, I get this output: DevDOC/Enhancements to PA Peer Checklist/PA Peer Checklist (V2.3) -v10.0pdf&parent=/sites/WG-ProductManagementTeam/FunctionalSpecs/DevDOC/Enhancements to PA Peer Checklist&p=true&ga=1">WG-Product Management Team - PA Peer Checklist (V2.3) -v10.0.pdf what I want is this: DevDOC/Enhancements to PA Peer Checklist/PA Peer Checklist (V2.3) -v10.0pdf Thanks for reply. – Faizan Shaikh Sarkar Nov 10 '22 at 11:03
  • you may add **FunctionalSpecs** in REGEX to filter your desired string as below `url.match(/FunctionalSpecs\/DevDOC[\s\S]*\.pdf/i)` – Tiwari Nov 10 '22 at 12:15
  • I did and tried in multiple ways but it didnt work like I wanted to but no worries, I used my previous regex for all other cases and just made new regex to fetch link in href when there were multiple .pdf and atleast 1 href in string so this way it is working perfectly for all cases now. Thank you for your time and help, I really appreciate it. – Faizan Shaikh Sarkar Nov 14 '22 at 09:45