How to get artifact file's URI using Artifactory's checksum API where multiple artifacts have same SHA-1 / SHA-256 values aka file's content

Question

Artifactory Version: 5.8.4

In Artifactory, files are stored in the internal database via file's checksum (SHA1) and for retrieval purposes, SHA-256 is useful (for verifying if file is intact).

Read this first: https://www.jfrog.com/confluence/display/RTF/Checksum-Based+Storage

Let's say there are 2 Jenkins jobs, which creates few artifacts/file (rpm/jar/etc). In my case, I'll take a simple .txt file which stores date in MM/DD/YYYY format and some other jobA/B specific build result files (jars/rpms etc).

If we focus only on the text file (as I mentioned above), then:

Jenkins_jobA > generates jobA.date_mm_dd_yy.txt

Jenkins_jobA > generates jobB.date_mm_dd_yy.txt

Jenkins jobA and jobB run multiple times per day in no given run order. Sometime jobA runs first and sometime jobB.

As the content of the file are mostly same for both jobs (per day), the SHA-1 value on jobA's .txt file and jobB.txt file will be same i.e. in Artifactory, both files will be stored in the first 2 character based directry folder structure (as per the Check-sum based storage mechanism).

Basically running sha1sum and sha256sum on both files in Linux, would return the exact same output.

Over the time, these artifacts (.txt, etc) gets promoted from one repository to another (promotion process i.e. from snapshot -> stage -> release repo) so my current logic written in Groovy is to find the URI of the artifact sitting behind a "VIRTUAL" repository (which contains a set of physical local repositories in some order) is listed below:

// Groovy code
import groovy.json.JsonSlurper
import groovy.json.JsonOutput
jsonSlurper = new JsonSlurper()

// The following function will take artifact.SHA_256 as it's parameter to find URI of the artifact

def checkSumBasedSearch(artifactSha) {
virt_repo = "jar-repo"    // this virtual may have many physical repos release/stage/snapshot for jar(maven) or it can be a YUM repo for (rpm) or generic repo for (.txt file)
// Note: Virtual repos don't span different repo types (i.e. a virtual repository in Artifactory for "Maven" artifacts (jar/war/etc) can NOT see YUM/PyPi/Generic physical Repos).

// Run aqlCmd on Linux, requires "...", "..", "..." for every distinctive words / characters in the cmd line.
checkSum_URL = artifactoryURL + "/api/search/checksum?sha256="
aqlCmd = ["curl", "-u", username + ":" + password, "${checkSum_URL}" + artifactSha + "&repos=" + virt_repo]
  }

  def procedure = aqlCmd.execute()
  def standardOut = new StringBuilder(), standardErr = new StringBuilder()
  procedure.waitForProcessOutput(standardOut, standardErr) 
  // Fail early

  if (! standardErr ) {
    println "\n\n-- checkSumBasedSearch() - standardErr exists ---\n" + standardErr +"\n\n-- Exiting with error 12!!!\n\n"
    System.exit(12)
  }

  def obj = jsonSlurper.parseText(standardOut.toString())
  def results = obj.results
  def uri = results[0].uri       // This would work, if a file's sha-1 /256 is always different or sits in different repo at least. 
  return uri

  // to get the URL, I can use:
  //aqlCmd = ["curl", "-u", username + ":" + password, "${uri}"]
  //def procedure = aqlCmd.execute()
  //def obj = jsonSlurper.parseText(standardOut.toString())
  //def url = obj.downloadUri 
  //return url
  //aqlCmd = [ "curl", "-u", username + ":" + password, "${url}", "-o", somedirectory + "/" + variableContainingSomeArtifactFilenameThatIWant ]
  //
  // def procedure = aqlCmd.execute()
  //def standardOut = new StringBuilder(), standardErr = new StringBuilder()
  //procedure.waitForProcessOutput(standardOut, standardErr)
  // Now, I'll get the artifact downloaded in some Directory as some Filename.

}

My concern is, as both files (even though different name -or file-<versioned-timestamp>.txt) have same content in them and generated multiple times per day, how can I get a specific versioned file downloaded for jobA or jobB?

In Artifactory, the SHA_256 property for all files containing same content will be same!! (Artifactory will use SHA-1 for storing these files efficiently to save space, new uploads will be just minimal database level transactions transparent to the user).

Questions:

Will the above logic return jobA's file or jobB's .txt file or any Job's .txt file which uploaded it's file first or latest/acc. to LastModified -aka- last upload time?
How can I get jobA's .txt file and jobB's .txt file downloaded for a given timestamp?
Do I need to add more properties during my rest Api call?

If I was just concerned for the file content, then it doesn't matter much (sha-1/256 dependent) whether it's coming from JobA .txt or job's .txt file, but in a complex case, one may have file name containing meaningful info that they'd like to know to find which file was download (A / B)!

galusben · Answer 1 · 2018-08-21T15:02:14.160

0

You can use AQL (Artifactory Query Langueage)

https://www.jfrog.com/confluence/display/RTF/Artifactory+Query+Language

curl -u<username>:<password> -XPOST https://repo.jfrog.io/artifactory/api/search/aql -H "Content-Type: text/plain" -T ./search

The content of the file named search is:

items.find(
  {
    "artifact.module.build.name":{"$eq":"<build name>"},
    "artifact.sha1":"<sha1>"
  }
)

The above logic (in the original question) will return one of them arbitrary, since you are taking the first result returned and there is no guarantee on the order.
Since your text file contains the timestamp in the name, then you can add the name to the aql given above, it will also filter by the name.
AQL search API is more flexible than the checksum search, use it and customise your query according to the parameters you need.

edited Aug 21 '18 at 15:02

answered Aug 21 '18 at 14:50

galusben

5,948
6
33
52

Yes that makes sense. I'm currently using checksum(get URI), then checksumURL(pass uri, to get url) and then using AQL, download the artifact. Seems like, having another field like artifact name is also required to narrow down the file to just one file – AKS Aug 21 '18 at 20:10
So I used AQL in past, but the reason we didn't used AQL was: An artifact can change repos within a same instance (well AQL can still find it but then you need to specific which repo?) but again, I can specify a "virtual repo" to deal with that, but in our case, we were moving artifacts from one Artifactory instance to another instance in a different repo name (think nonsecure/secure side). So checksum way was better as checksum can tell where exactly the file is with that SHA256 independent of any repo or instance you deal with in future. But, return[0].uri (getting first index) was incorrect – AKS Aug 24 '18 at 17:41
To fix that, I did what my answer says. – AKS Aug 24 '18 at 17:41

score 0 · Answer 2 · answered Aug 24 '18 at 17:48

So, I ended up doing this instead of just returning [0]th element from array in every case.

  // Do NOT return [0] first element as yet as Artifactory uses SHA-1/256 so return [Nth].uri where artifact's full name matches with the sha256
  // def uri = results[0].uri

  def nThIndex=0
  def foundFlag = 'false'
  for (r in results) {
    println "> " + r.uri + " < " + r.uri.toString() + " artifact: " + artFullName
    if ( r.uri.toString().contains(artFullName) ) {
       foundFlag = 'true'
       println "- OK - Found artifact: " + artFullName + " at results[" + nThIndex + "] index."
       break; // i.e. a match for the artifact name with SHA-256 we want - has been found.
    } else {
       nThIndex++;
    }
  }

  if ( foundFlag == 'true' ) {
      def uri = results[nThIndex].uri
      return uri
  } else {
    // Fail early if results were found based on SHA256 but not for the artifact but for some other filename with same SHA256
    if (! standardErr ) {
      println "\n\n\n\n-- [Cool] -- checkSum_Search() - SHA-256 unwanted situation occurred !!! -- results Array was set with some values BUT it didn't contain the artifact (" + artFullName + ") that we were looking for \n\n\n-- !!! Artifact NOT FOUND in the results array during checkSum_Search()---\n\n\n-- Exiting with error 17!!!\n\n\n\n"
      System.exit(17)   // Nooka
    }
  }

How to get artifact file's URI using Artifactory's checksum API where multiple artifacts have same SHA-1 / SHA-256 values aka file's content

If we focus only on the text file (as I mentioned above), then:

2 Answers2