Artifactory Version: 5.8.4
In Artifactory, files are stored in the internal database via file's checksum (SHA1) and for retrieval purposes, SHA-256 is useful (for verifying if file is intact).
Read this first: https://www.jfrog.com/confluence/display/RTF/Checksum-Based+Storage
Let's say there are 2 Jenkins jobs, which creates few artifacts/file (rpm/jar/etc). In my case, I'll take a simple .txt file which stores date in MM/DD/YYYY format and some other jobA/B specific build result files (jars/rpms etc).
If we focus only on the text file (as I mentioned above), then:
Jenkins_jobA > generates jobA.date_mm_dd_yy.txt
Jenkins_jobA > generates jobB.date_mm_dd_yy.txt
Jenkins jobA and jobB run multiple times per day in no given run order. Sometime jobA runs first and sometime jobB.
As the content of the file are mostly same for both jobs (per day), the SHA-1 value on jobA's .txt file and jobB.txt file will be same i.e. in Artifactory, both files will be stored in the first 2 character based directry folder structure (as per the Check-sum based storage mechanism).
Basically running sha1sum and sha256sum on both files in Linux, would return the exact same output.
Over the time, these artifacts (.txt, etc) gets promoted from one repository to another (promotion process i.e. from snapshot -> stage -> release repo) so my current logic written in Groovy is to find the URI of the artifact sitting behind a "VIRTUAL" repository (which contains a set of physical local repositories in some order) is listed below:
// Groovy code
import groovy.json.JsonSlurper
import groovy.json.JsonOutput
jsonSlurper = new JsonSlurper()
// The following function will take artifact.SHA_256 as it's parameter to find URI of the artifact
def checkSumBasedSearch(artifactSha) {
virt_repo = "jar-repo" // this virtual may have many physical repos release/stage/snapshot for jar(maven) or it can be a YUM repo for (rpm) or generic repo for (.txt file)
// Note: Virtual repos don't span different repo types (i.e. a virtual repository in Artifactory for "Maven" artifacts (jar/war/etc) can NOT see YUM/PyPi/Generic physical Repos).
// Run aqlCmd on Linux, requires "...", "..", "..." for every distinctive words / characters in the cmd line.
checkSum_URL = artifactoryURL + "/api/search/checksum?sha256="
aqlCmd = ["curl", "-u", username + ":" + password, "${checkSum_URL}" + artifactSha + "&repos=" + virt_repo]
}
def procedure = aqlCmd.execute()
def standardOut = new StringBuilder(), standardErr = new StringBuilder()
procedure.waitForProcessOutput(standardOut, standardErr)
// Fail early
if (! standardErr ) {
println "\n\n-- checkSumBasedSearch() - standardErr exists ---\n" + standardErr +"\n\n-- Exiting with error 12!!!\n\n"
System.exit(12)
}
def obj = jsonSlurper.parseText(standardOut.toString())
def results = obj.results
def uri = results[0].uri // This would work, if a file's sha-1 /256 is always different or sits in different repo at least.
return uri
// to get the URL, I can use:
//aqlCmd = ["curl", "-u", username + ":" + password, "${uri}"]
//def procedure = aqlCmd.execute()
//def obj = jsonSlurper.parseText(standardOut.toString())
//def url = obj.downloadUri
//return url
//aqlCmd = [ "curl", "-u", username + ":" + password, "${url}", "-o", somedirectory + "/" + variableContainingSomeArtifactFilenameThatIWant ]
//
// def procedure = aqlCmd.execute()
//def standardOut = new StringBuilder(), standardErr = new StringBuilder()
//procedure.waitForProcessOutput(standardOut, standardErr)
// Now, I'll get the artifact downloaded in some Directory as some Filename.
}
My concern is, as both files (even though different name -or file-<versioned-timestamp>.txt
) have same content in them and generated multiple times per day, how can I get a specific versioned file downloaded for jobA or jobB?
In Artifactory, the SHA_256 property for all files containing same content will be same!! (Artifactory will use SHA-1 for storing these files efficiently to save space, new uploads will be just minimal database level transactions transparent to the user).
Questions:
Will the above logic return jobA's file or jobB's .txt file or any Job's .txt file which uploaded it's file first or latest/acc. to LastModified -aka- last upload time?
How can I get jobA's .txt file and jobB's .txt file downloaded for a given timestamp?
Do I need to add more properties during my rest Api call?
If I was just concerned for the file content, then it doesn't matter much (sha-1/256 dependent) whether it's coming from JobA .txt or job's .txt file, but in a complex case, one may have file name containing meaningful info that they'd like to know to find which file was download (A / B)!