Actual
I have been using the RSiteCatalyst package for a while right now. For those who do not know it, it makes the process of obtaining data from Adobe Analytics over the API easier.
Until now, the workflow was as follow:
- Make a request, for instance:
key_metrics <- QueueOvertime(clientId, dateFrom4, dateTo,
metrics = c("pageviews"), date.granularity = "month",
max.attempts = 500, interval.seconds = 20)
Wait for the response which will be saved as a data.frame (example structure):
> View(head(key_metrics,1)) datetime name year month day pageviews 1 2015-07-01 July 2015 2015 7 1 45825
Do some data transformation (for example:
key_metrics$datetime <- as.Date(key_metrics$datetime)
The problem with this workflow is that sometimes (because of request complexity), we can wait a lot of time until the response finally comes. If the R script contains 40-50 API requests which are same complex, that means that we will be waiting 40-50 times until data finally comes and we can do a new request. This is clearly generating a bootleneck in my ETL process.
Target
There is however a parameter enqueueOnly
in most of the functions of the package, that tells Adobe to process the request while delivering a report Id as response:
key_metrics <- QueueOvertime(clientId, dateFrom4, dateTo,
metrics = c("pageviews"), date.granularity = "month",
max.attempts = 500, interval.seconds = 20,
enqueueOnly = TRUE)
> key_metrics
[1] 1154642436
I can obtain the "real" response (this with data) anytime by using following function:
key_metrics <- GetReport(key_metrics)
In each request I am adding the parameter enqueueOnly = TRUE
while generating a list of Report Ids and Report Names:
queueFromIds <- c(queueFromIds, key_metrics)
queueFromNames <- c(queueFromNames, "key_metrics")
The most important difference with this approach is that all my requestes are being processed by Adobe at the same time, and therefore the waiting time is considerably decreased.
Problem
I am having, however, problems by obtaining the data efficiently. I am trying with a while
loop that removes the key ID and key Name from the previous vectors once data is obtained:
while (length(queueFromNames)>0)
{
assign(queueFromNames[1], GetReport(queueFromIds[1],
max.attempts = 3,
interval.seconds = 5))
queueFromNames <- queueFromNames[-1]
queueFromIds <- queueFromIds[-1]
}
However, this only works as long as the requests are simple enough to be processed in seconds. When the request is complex enough to not be processed in 3 attempts with an interval of 5 seconds, the loop stops with following error:
Error in ApiRequest(body = toJSON(request.body), func.name = "Report.Get", : ERROR: max attempts exceeded for https://api3.omniture.com/admin/1.4/rest/?method=Report.Get
Which functions may help me to control that all the API requests are being correctly processed, and, in the best scenario, API requests that need an extra time (they generate an error) are skipped until the end of the loop, when they are again requested?