0

Following on from my previous question (How to replace all anchor tags with a different anchor using regex in ColdFusion), I would like to use JSoup to manipulate the content of an Argument thats come in from a Form, before inserting the manipulated content into a database.

Here is an example of what is sent to the server from the Form:

<form>
   <div id="Description" contenteditable="true">
   <p>
      Terminator Genisys is an upcoming 2015 American 
      science fiction action film directed by Alan Taylor. 

      <img id="Img0" src="http://www.moviepics.com/terminator1.jpg" />
      <img id="Img1" src="http://www.moviepics.com/terminator2.jpg" />
      <img id="Img2" src="http://www.moviepics.com/terminator2.jpg" />

      You can find out more by <a href="http://www.imdb.com">clicking here</a>
   </p>
   </div>
</form>

Here is how my CFC would deal with it currently (basic idea):

<cfquery>
INSERT INTO MyTable (Value1, Description)
VALUES
(
   <cfif structkeyexists(ARGUMENTS.Value1)>
      <cfqueryparam value="#ARGUMENTS.Value1#" cf_sql_type="nvarchar" />
   <cfelse>
      NULL
   </cfif>

   ,
   <!--- 
    Before the below happens, I need to replace the src 
    attributes of the img tags of Arguments.Description 
   --->
   <cfif structkeyexists(ARGUMENTS.Description)>
       <cfqueryparam value="#ARGUMENTS.Description#" cf_sql_type="nvarchar" />
   <cfelse>
       NULL
   </cfif>
)
</cfquery>

I know <div> is not a form element, but not to worry its still submitted to CF11 as if its a form element using JQuery serialize() trickery.

When CF11 processes this form, it gets the data in ARGUMENTS.Description. What I want to do is parse the contents of this argument, find the <img> tags, and extract out the src attribute.

I'll then do some more processing, but eventually I need to replace the src values in each of the img tags with a different value that is created by CF11 on the server side. Only then I can insert the form value into the database.

Can JSoup assist in this kind of a task? It feels like a simple find and replace task but I'm very lost as to how to go about it.

Community
  • 1
  • 1
volume one
  • 6,800
  • 13
  • 67
  • 146
  • 2
    jSoup is perfectly suited for something like this. You could use RegEx, but will likely encounter fringe cases where the process will fail. – Scott Stroz Feb 16 '15 at 15:21

1 Answers1

1

First, you have an error in your markup, the src attributes of image tags have no close-quotes. Make sure you fix that before you attempt to use this

<cfsavecontent variable="samform">
    <form>
    <div id="Description" contenteditable="true">
    <p>Terminator Genisys is an upcoming 2015 American science fiction action film directed by Alan Taylor. 

    <img id="Img0" src="http://www.moviepics.com/terminator1.jpg" />
    <img id="Img1" src="http://www.moviepics.com/terminator2.jpg" />
    <img id="Img2" src="http://www.moviepics.com/terminator2.jpg" />

    You can find out more by <a href="http://www.imdb.com">clicking here</a></p>
    </div>
    </form>
</cfsavecontent>

<cfscript>
jsoup = CreateObject("java", "org.jsoup.Jsoup");
alterform = jsoup.parse(samform);

imgs = alterform.select("##Description img");


for (img in imgs) {
    img.attr("src", "betterthan#listlast(img.attr("src"),"/")#");
}

imgs[2].attr("src", "TheyShouldHaveStoppedAtT2.gif");

writeOutput('<textarea rows="10" cols="100">#samform#</textarea><br>');
writeOutput('<textarea rows="10" cols="100">#alterform#</textarea>');
</cfscript>

If you're familiar with css selectors or jquery selectors, jSoup selecting is nearly second-nature.

What this does is it loops over every img in #Description (# has to be doubled because CF). It then changes the url to something based on the current url, and then just to demonstrate, I override the second img's src with something else and output the before/after in textareas.

Regular Jo
  • 5,190
  • 3
  • 25
  • 47
  • I actually need to change the content of the Argument itself so that it now contains different src attributes. I want to them insert that into the database. Is that possible? I edited my question to provide more code / better idea of what I mean – volume one Feb 16 '15 at 18:24
  • @volumeone yes, you can say `img[1].attr("src",arguments.value1)` etc, The loop is not necessary, I was just showing you different techniques. – Regular Jo Feb 16 '15 at 18:33
  • Can I parse just #ARGUMENTS.Description# rather than the whole form? – volume one Feb 17 '15 at 23:21
  • @volumeone You can parse whatever variable you want containing valid markup. I only used the whole form because you said you were retrieving it through "JS Trickery" so I just grabbed the sample right from your post. No changes should need to be made to the code as long as your html syntax is correct (since it had that missing-quote in what you posted here, but that could have been isolated to here. – Regular Jo Feb 17 '15 at 23:25
  • I can't thank you enough for showing me this. It works beautifully! Its so quick as well. Its really helped me out tons. – volume one Feb 18 '15 at 17:50