0

I have this zsh function that finds the readme of github projects from their root link (e.g., https://github.com/Mortennn/Dozer -> https://github.com/Mortennn/Dozer/blob/master/README.md). Because the casing of this file is inconsistent (e.g., readme Readme READMe), I am currently trying all possible case permutations, which is very slow.

Is there a more efficient solution?

#There are lots of stuff here, just ignore them. You can ignore this code completely, too. The question is self-sufficient.
for readme in "${(0@)$(permute-case readme)}"
do
i2="${i}/blob/master/${readme}.md"
silence wget --spider "$i2" && break
i2="${i}/blob/master/${readme}.rst"
silence wget --spider "$i2" && break
done
HappyFace
  • 3,439
  • 2
  • 24
  • 43
  • Why don't you get the file listing, parse it yourself and then request the correct file? – paddy Aug 12 '19 at 19:23
  • @paddy Is that a general solution, or does it work just with github? If you mean that I just scrape the github's page for the link, I'll rather go with this slow method. The reason I asked this question is more to see if there is a way to query http servers, not to parse the frontends. – HappyFace Aug 12 '19 at 19:28
  • The question appears to be specific to GitHub. Are you looking for a general solution? It is entirely up to the webserver to determine whether to treat a URI as case-sensitive. If you just want something for GitHub, and want an efficient solution, then I don't understand why you prefer the non-efficient method that may potentially get you blocked due to systematically probing non-existent URLs. – paddy Aug 12 '19 at 19:37
  • Note also that on the first link you provided (the project main page), the HTTP-ified contents of the readme are inside a `
    ` with the id `readme`.
    – paddy Aug 12 '19 at 19:40
  • @paddy I actually want the raw markdown to feed into pandoc, but as I said, I am really interested in a way to query http servers for case insensitive urls. – HappyFace Aug 13 '19 at 06:39

0 Answers0