I have the following code that iterates recursively and does something over the network. As it goes over the network, I would like to do some optimizations, where the very first optimization would be to avoid going over the network for certain elements that I have already tried.
For example., in the case below, I call a URL, extract the HREF's found in that URL and call those URL's and report the status. As it could be possible that certain URL's might be fetched again, for those URL's that failed, I would like to add them to a global state so that when I encounter this URL the next time, I will avoid those network calls.
Here is the code:
def callURLWithCache(url: String): Task[HttpResult] = {
Task {
Http(url).timeout(connTimeoutMs = 1000, readTimeoutMs = 3000).asString
}.attempt.map {
case Left(err) =>
println(s"ERR happened ----------------- $url ************************ ${err.getMessage}")
// Add to the cache
val httpResult = HttpResult(source = url, isSuccess = false, statusCode = 1000, errorMessage = Some(err.getMessage))
val returnnnn: Try[Any] = httpResultErrorCache.put(url)(httpResult)
httpResult
case Right(doc) =>
if (doc.isError) {
HttpResult(source = url, isSuccess = doc.isSuccess, statusCode = doc.code)
} else {
val hrefs = (browser.parseString(doc.body) >> elementList("a[href]") >?> attr("href"))
.distinct.flatten.filter(_.startsWith("http"))
HttpResult(source = url, isSuccess = doc.isSuccess, statusCode = doc.code, elems = hrefs)
}
}
}
You can see in the case Left(....) block that I add the failed case class to the cache which I define globally on the enclosing class of this function as:
val underlyingCaffeineCache: cache.Cache[String, Entry[HttpResult]] = Caffeine.newBuilder().maximumSize(10000L).build[String, Entry[HttpResult]]
implicit val httpResultErrorCache: Cache[HttpResult] = CaffeineCache(underlyingCaffeineCache)
Here is the function that I do a recursive operation:
def parseSimpleWithFilter(filter: ParserFilter): Task[Seq[HttpResult]] = {
def parseInner(depth: Int, acc: HttpResult): Task[Seq[HttpResult]] = {
import cats.implicits._
if (depth > 0) {
val batched = acc.elems.collect {
case elem if httpResultErrorCache.get(elem).toOption.exists(_.isEmpty) =>
callURLWithCache(elem).flatMap(newElems => parseInner(depth - 1, newElems))
}.sliding(30).toSeq
.map(chunk => Task.parSequence(chunk))
Task.sequence(batched).map(_.flatten).map(_.flatten)
} else Task.pure(Seq(acc))
}
callURLWithCache(filter.url).map(elem => parseInner(filter.recursionDepth, elem)).flatten
}
It can be seen that I'm checking if the url that I have as my current elem is already in the cache, meaning that I have already tried it and failed, so I would like to avoid doing a HTTP call once again for it.
But what happens is that, the httpResultErrorCache turns up always empty. I'm not sure if the Task chunk is causing this behavior. Any ideas on how to get the cache to work?