3

I want to get the extensions of a few files from their download links.

Download links does not contain the extensions of their files. For example, a link looks like below:

http://yourshot.nationalgeographic.com/u/fQYSUbVfts-T7odkrFJckdiFeHvab0GWOfzhj7tYdC0uglagsDNfNYI4FFesWV5zeSPtcfpyHzKZI7dHjkluwtIYNkXOGmjh43Ktdn0VeBWhQ-9l2kheOPt5N2TM3yPEW4tTrtFFqniatwxxhbqsc78IU2pBaqWwyEVLeQx64zSda2CNGmUpSxyte_tamVoIk3y4zXisQ-vjmMp6n1BAB3nbUVlwWg/

I tried to get the files extension using myHttpUrlConnection.getContentType(), but the result was not the result what I want.

Some download links return a phrase like “text/plain”, ”application-octet-stream”,multipart/form-data ,. But I just want correct and clear type, like rar, mp4, txt, jpeg,mkv, zip, png, apk, mp3, .

halfer
  • 19,824
  • 17
  • 99
  • 186
Hadi
  • 544
  • 1
  • 8
  • 28

1 Answers1

3

You cannot do that. The getContentType() method simpy:

Returns the value of the content-type header field.

which in most cases is (though there is no guarantee) related to the file extension/file type, for example application/pdf would mean there is a PDF file under that URL.

Each of the file types with extension you have listed (rar, mp4, txt, jpeg,mkv, zip, png, apk, mp3) have another structure. To do reliably what you want to do, you would have to first download the whole file and then check its type based on the contents.

A good example of a library you could use is Apache Tika.

syntagma
  • 23,346
  • 16
  • 78
  • 134
  • thanks @syntagma. can i download a few bytes(for example 5 bytes) of the file, then check its type? or i have to download whole the file? – Hadi Dec 23 '17 at 22:08
  • 1
    In *some* cases, you could detect file type based on N first bytes, see for example Tika's `MagicDetector`: https://tika.apache.org/1.1/detection.html (*By looking for special ("magic") patterns of bytes near the start of the file, it is often possible to detect the type of the file. For some file types, this is a simple process. For others, typically container based formats, the magic detection may not be enough. (More detail on detecting container formats below)*) – syntagma Dec 23 '17 at 22:16
  • 1
    @Hadi that depends on the filetype, some have headers to identify them (.class files and .png do), but even with these headers, it might actually just be different data that happens to have that specific bit pattern – phflack Dec 23 '17 at 22:17