-1

I have a xlsx/csv file, which I am trying to modify it's contents with notepad++. Exactly a url inside href. Ex:

href=""/xs_db/DOKUMENT_DB/www/Datenblaetter/de/7/7521_Datasheet--de.pdf""
href=""/xs_db/DOKUMENT_DB/www/Datenblaetter/de/7609_Datasheet--de.pdf""
href=""/xs_db/DOKUMENT_DB/www/Datenblaetter/de/6/7981_Datasheet--de.pdf""
etc...

After replace, I want them to look like this:

href=""/docs/7521_Datasheet--de.pdf""
href=""/docs/7609_Datasheet--de.pdf""
href=""/docs/7981_Datasheet--de.pdf""

Right now, I have this pattern on find:

(?<=href=(""|''))[^"']+(?=(.pdf""|.pdf''))

EDIT: After trying the given examples no string matches. Here is full cell text:

"<table cellspacing=""0"" width=""100%"" border=""0"" cellpadding=""10""><tbody><tr>
 <td align=""left"" valign=""top"">
 <table cellspacing=""0"" width=""100%"" border=""0"" cellpadding=""0""><tbody><tr>
 <td>
 <table cellspacing=""0"" width=""100%"" border=""0"" cellpadding=""0""><tbody><tr>
 <td align=""left"" valign=""top"" class=""DocRepCell1""><img src=""/catalog/pdf.gif"" alt="" "" border=""0""></td>
 <td align=""left"" width=""97%"" valign=""middle"" class=""DocRepCell2""><span class=""NavigationButtonMoreInfos"">Produktinformation breite</span> </td>
 <td align=""right"" width=""1%"" nowrap=""nowrap"" valign=""middle"" class=""DocRepCell3"">0,1 MB</td>
 <td align=""right"" width=""1%"" nowrap=""nowrap"" valign=""middle"" class=""DocRepCell4"">
  <a class=""NavigationButtonMoreInfos"" target=""_blank"" href=""/xs_db/DOKUMENT_DB/www/Datenblaetter/de/7/7521_Datasheet--de.pdf"">herunterladen</a></td></tr>
  </tbody></table></td></tr></tbody>
  </table></td></tr>
  </tbody></table></td></tr>
  </tbody></table>"
Xhevat Ziberi
  • 19
  • 1
  • 10

2 Answers2

0

You can try the following find and replace in regex mode:

Find:

^href=""/.*?(\d+_Datasheet.*\.pdf"")$

Replace:

href=""/docs/$1

Note that the find pattern could be made more generic if it doesn't work on more of your data. But in general we would need some concrete way of identifying the start of the suffix which you wish to retain in the match. If my answer doesn't work for you, then state where it fails and provide logic which allows the suffix to be identified.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

Here's a way to just match the part you want to replace with the path /docs

Find what :

^href=["']+\K(/.*?)(?=/\d+_[\w-]+\.pdf["']+$)

Replace with :

/docs

Search mode : Regular Expression (best with ". matches new lines" checked off)

LukStorms
  • 28,916
  • 5
  • 31
  • 45