0

I'm currently trying to develop an API and the stage where I'm at just now is to populate a table with a full data set (ID, first name, last name, dob etc).

The way I've written this is to use a cfloop from 1 to 500,000 (as I don't know what range the IDs range from and to) and within each cfloop I call a function that makes a cfhttp request to the server and retrieve the content.

I then deserialize the returned JSON, call a function to query my table to see if the current item ID already exists and, if not, call a function to insert the record.

However the cfloop seems to stop around the 300 request mark so I was wondering if there was a better way to do what I'm doing? Perhaps by using the CFTHREAD tag which I've never had any experience of using.

The section of code for this is as follows:

<cfset Variables.url = "someurl.html" />
<cfloop from=100000 to=500000 index="itemNo">
    <cfset Variables.itemID = itemNo />
    <cfset getItemData = Application.cfcs.Person.getPersonData(Variables.url,Variables.itemID) />
    <cfif StructKeyExists(Variables,"getPersonData.FileContent")>
        <cfset Variables.getPersonData = DeserializeJSON(getPersonData.FileContent)>
        <cfscript>
            // CHECK IF PERSON ALREADY IN DATABASE
            Variables.getPerson = Application.cfcs.Person.getPersonRecord(Variables.itemID);
            // INSERT ITEM IN TO TABLE
            Variables.DOB = CreateDate(Year(Variables.getPersonData.Item.DateOfBirth.Year),Month(Variables.getPersonData.Item.DateOfBirth.Month),Day(Variables.getPersonData.Item.DateOfBirth.Day));
            Variables.insPerson = Application.cfcs.Person.insPerson(Variables.getPersonData.personID,Variables.getPersonData.Item.FirstName,Variables.getPersonData.Item.LastName,Variables.getPersonData.Item.CommonName,Variables.DOB);   
        </cfscript>
    </cfif>
</cfloop>
CPB07
  • 679
  • 3
  • 13
  • 23
  • 2
    "The cfloop seems to stop around the 300 request mark" - Is it giving an error message? Anything in the logs? – imthepitts Jun 24 '13 at 20:54
  • From where does cfcs.Person.getPersonData() get it's data? – Dan Bracuk Jun 24 '13 at 21:27
  • If err: out of memory, you've got to do it in small batches because AFAIK memories are not released until request ends in CF. – Henry Jun 24 '13 at 21:44
  • 4
    First of all you need to look up what an API is. http://en.wikipedia.org/wiki/Application_programming_interface -- All you appear to be doing is making lots of calls for data that may or may not exist and trying to load it into a database. This is no better than screen scraping. Put down the ColdFusion hammer and think of an elegant solution like bulk loading. – Butifarra Jun 24 '13 at 23:11
  • I Agree with @Claude, if there's any way to avoid making half a million HTTP requests, then take that approach. If there's really no way around it, here's some pointers: Your process may/will crash, so you'll have to maintain state on your side to keep track of what you have and haven't tried. If you care about how fast this process runs, don't use CFHTTP, it's not performant for high traffic and due to the memory issues you've seen it won't hold up. Instead use one of the Java HTTP libraries which supports keep-alives: http://hc.apache.org/httpcomponents-client-ga/index.html – barnyr Jun 25 '13 at 05:55
  • @Claude. In my API I'm going to have pre-populated drop-downs but surely for them to be pre-populated I need them to have data first, no? – CPB07 Jun 25 '13 at 06:43
  • @barnyr - Thanks, great advice! This function that I'm trying to to create at present will be a one-off import into the table but everything else in the application uses CFHTTP to make requests back and forth. – CPB07 Jun 25 '13 at 06:45
  • Can I call a Java HTTP library within my CF page? If so the actual request function I have is http://jsfiddle.net/s3uV6/. So do I then just set each header request as a parameter? – CPB07 Jun 25 '13 at 06:48

1 Answers1

1

Yes it is possible. You need to split up the call. Create a simple htlm page which makes a xmlhttprequest in javascript. I haven't tested the example below but it should work.

<script>
var itemNo= 1;
function download()
{
 var xhr = new XMLHttpRequest();
 xhr.open("GET", "getdata.cfm?itemNo="+itemNo, true);
 xhr.onload = function (e) {
  if (xhr.readyState === 4) {
   if (xhr.status === 200) 
   {
     itemNo++;
     if(itemNo<=500000) download();
   }
   else 
   {
     itemNo++;
    // Error handling
   }
  }
 };
 xhr.onerror = function (e) {
      itemNo++;
 // Error handling
 };
 xhr.send(null);
}
</script>

On the requested page make the the call to the object that makes the cfhttp request.

<!--- getdata.cfm --->
<cfset Variables.url = "someurl.html" />
<cfset Variables.itemID = itemNo />
<cfset getItemData = Application.cfcs.Person.getPersonData(Variables.url,Variables.itemID) />
<cfif StructKeyExists(Variables,"getPersonData.FileContent")>
    <cfset Variables.getPersonData = DeserializeJSON(getPersonData.FileContent)>
    <cfscript>
        // CHECK IF PERSON ALREADY IN DATABASE
        Variables.getPerson = Application.cfcs.Person.getPersonRecord(Variables.itemID);
        // INSERT ITEM IN TO TABLE
        Variables.DOB = CreateDate(Year(Variables.getPersonData.Item.DateOfBirth.Year),Month(Variables.getPersonData.Item.DateOfBirth.Month),Day(Variables.getPersonData.Item.DateOfBirth.Day));
        Variables.insPerson = Application.cfcs.Person.insPerson(Variables.getPersonData.personID,Variables.getPersonData.Item.FirstName,Variables.getPersonData.Item.LastName,Variables.getPersonData.Item.CommonName,Variables.DOB);   
    </cfscript>
</cfif>

On the requested page you could use cfhtread to make multiple http request simultaneously. You can look here for more information about using cfthread together with cfhttp http://www.bennadel.com/blog/749-Learning-ColdFusion-8-CFThread-Part-II-Parallel-Threads.htm

Nebu
  • 1,753
  • 1
  • 17
  • 33
  • So should the html page that contains the download function be making the request to the 3rd party website? – CPB07 Jun 28 '13 at 13:10
  • No the getdata.cfm page that is called should contain the cfhttp function. This ensures that the memory used for creating the http request is released after the page is done processing. (BTW I assumed that the actual cfhttp request is located inside the Application.cfcs.Person.getPersonData component. On a side note it is advicable to create a simple container like that tracks which id is last called document.getElementById("counter").innerHTML = rownr ; – Nebu Jun 28 '13 at 13:28
  • Yeah that's right. So within the getdata.cfm page I do a cfinclude template on the html page that contains the script and directly after I call the download function in a set of script tags, before I set itemID? – CPB07 Jun 28 '13 at 13:45
  • No the html page is called by you in your browser. The sequence is as follows. Open the html page. Make a call to the dowload function. The download function makes a http request to getdata.cfm with a variable named itemNo. On the getdata.cfm page you call the component. After all codes are executed on the getdata.cfm page it returns the page content (html) back to the download function. Then the download functions increases itemNo variable by 1 and makes another call to the getdata.cfm page. This process continues until the rownr 500001 is reached. – Nebu Jun 28 '13 at 14:14
  • Made some minor changes to the original answer to make it easier to understand. – Nebu Jun 28 '13 at 14:15
  • Thanks so much, really fascinating performance! I am having one small issue in that if I hard-code the URL http://cdn.content.easports.com/fifa/fltOnlineAssets/2013/fut/items/web/175243.json in to my getPersonData function and just call that function not using the script you provided then the cfhttp response file content contains json but using the script you so kindly provided doesn't seem to provide json. Any ideas? – CPB07 Jun 28 '13 at 14:52
  • I am not sure what you mean, but you can debug your function but just calling getdata.cfm?itemNo=1 in your browser. – Nebu Jun 28 '13 at 19:27
  • Sorry that didn't even make sense to me when I read it back! What I meant was is there a way to get back any feedback from the getdata.cfm to say if the record has been inserted or even to output the returned content from the cfhttp request? – CPB07 Jun 30 '13 at 10:24
  • Sure any data(html/json/plain text/etc) you output on the getdata.cfm page is available in the XMLHttpRequest, for example: .... if (xhr.status === 200) { alert(xhr.responseText); itemNo++; if(itemNo<=500000) download(); }.... – Nebu Jun 30 '13 at 18:05
  • Fantastic! You mentioned using CFTHREAD to make multiple http requests simultaneously but how is this achievable using XMLHTTPRequest? Do I use multiple xhr.open lines or have multiple 'download' functions? – CPB07 Jul 05 '13 at 08:53
  • CFTHREAD is serverside scripting which i would not recommend in your situation. Although it is very powerfull it can also lead to all kind of problems. You can make asynchronous calls in javascript. Google asynchronous xmlhttprequest for more infomation about this topic. – Nebu Jul 10 '13 at 12:27
  • Are you aware of a way I can limit there to be a maximum of 3 http requests a second? I've put a setInterval function round the download function but even when I set the interval to call the getData.cfm page to every 5 seconds it can still result in the http requests queuing and more than 3 sending in the same second which results in me going over the max requests per second and timing out my session :( – CPB07 Jul 11 '13 at 08:44