32

I want to know that for make sure that the file that will be download from my script will have the extension I want.

The file will not be at URLs like:

http://example.com/this_url_will_download_a_file

Or maybe yes, but, I think that I will only use that kind of URL:

http://example.com/file.jpg

I will not check it with: Url.Substring(Url.LastIndexOf(".") - 3, 3) because this is a very poor way.

So, what do you recommend me to do?

Jaymin
  • 2,879
  • 3
  • 19
  • 35
z3nth10n
  • 2,341
  • 2
  • 25
  • 49
  • You could try to get last position of ?. If found, find last position of . before that and return everything in between. If no ? is found, whatever comes after last position of . will be your file extension. – Crono Apr 22 '14 at 19:24
  • Substring should work, just make sure you account for extensions with length greater than 3. – Victor Zakharov Apr 22 '14 at 19:24
  • are you the one providing URLs? Are they to your own site or 3rd party sites? – Yuriy Galanter Apr 22 '14 at 19:25
  • I think you have to substring it somehow, unless it is acceptable to download the file first, and then use FileSystemObject `GetExtensionName` or similar. – David Zemens Apr 22 '14 at 19:25
  • possible duplicate of [Check if url leads to a file or a page](http://stackoverflow.com/questions/18828971/check-if-url-leads-to-a-file-or-a-page) – Victor Zakharov Apr 22 '14 at 19:37
  • Check the above link and let me know if it answers your question. – Victor Zakharov Apr 22 '14 at 19:37
  • I think that I got it... I only need to check if the content-type is equals to "application/zip", I will test it... Let me check :) – z3nth10n Apr 22 '14 at 19:42
  • 1
    I've modified your title so it no longer says "in VB.NET". It's frowned upon to put tags in the title. And added the tag ".net" because any .NET developer (VB, C#, IronPython, etc) should be able to assist. – mason Apr 22 '14 at 19:42
  • Thanks!!! I have finish it ;) http://pastebin.com/PMJQyu4B – z3nth10n Apr 22 '14 at 19:58
  • 1
    If _content-type_ is the answer then question could be _Is there any way to get the file type from a URL_. – Software Engineer Apr 22 '14 at 20:16
  • 4
    Do you realize that a URL may not have a "file extension", and that any "extension" may have nothing at all to do with the content of the file? You want to care about the content type, not about a "file extension". Those are specific to particular operating systems, and do not in general apply to the web. – John Saunders Apr 22 '14 at 20:36
  • Well, and If the url is a file? For example, I want to get the content-type from Dropbox files, and, for now, it works ;) – z3nth10n Apr 23 '14 at 13:04

7 Answers7

20

It is weird, but it works:

string url = @"http://example.com/file.jpg";
string ext = System.IO.Path.GetExtension(url);
MessageBox.Show(this, ext);

but as crono remarked below, it will not work with parameters:

string url = @"http://example.com/file.jpg?par=x";
string ext = System.IO.Path.GetExtension(url);
MessageBox.Show(this, ext);

result: ".jpg?par=x"

Lee Taylor
  • 7,761
  • 16
  • 33
  • 49
heringer
  • 2,698
  • 1
  • 20
  • 33
  • I think that is because Microsoft allows for "/" of Unix/Lenox not just "\" as the directory separator character. I discovered this a few years ago on a team I was working - it was surprising - because the old style Win32 APIs dont work with that I think. It may be because Microsoft as porting stuff to .net Core? – John Foll Apr 14 '23 at 23:07
  • If [this is the real implementation](https://github.com/dotnet/runtime/blob/c1b7a9feb6f3b4d9ca27dc4f74d8260e4edb73e8/src/libraries/System.Private.CoreLib/src/System/IO/Path.cs#L195) of the method GetExtension, then it will work anyway, because it scans the string backwards until it finds an occurrence of the character dot (.). So, the slashes and back slashes does not really matter for most of cases. – heringer Apr 17 '23 at 11:02
18

here's a simple one I use. Works with parameters, with absolute and relative URLs, etc. etc.

public static string GetFileExtensionFromUrl(string url)
{
    url = url.Split('?')[0];
    url = url.Split('/').Last();
    return url.Contains('.') ? url.Substring(url.LastIndexOf('.')) : "";
}

Unit test if you will

[TestMethod]
public void TestGetExt()
{
    Assert.IsTrue(Helpers.GetFileExtensionFromUrl("../wtf.js?x=wtf")==".js");
    Assert.IsTrue(Helpers.GetFileExtensionFromUrl("wtf.js")==".js");
    Assert.IsTrue(Helpers.GetFileExtensionFromUrl("http://www.com/wtf.js?wtf")==".js");
    Assert.IsTrue(Helpers.GetFileExtensionFromUrl("wtf") == "");
    Assert.IsTrue(Helpers.GetFileExtensionFromUrl("") == "");
}

Tune for your own needs.

P.S. Do not use Path.GetExtension cause it does not work with query-string params

Alex from Jitbit
  • 53,710
  • 19
  • 160
  • 149
  • this does not work for absolute URL like `http://www.com/` as it will return `.com` as extension. – Joe Dec 15 '20 at 13:55
  • @Joe yep, except it's not an "absolute" URL, it's a "root" url. You might want to add an extra check, that the URL actually points to a file. – Alex from Jitbit Dec 16 '20 at 10:58
  • @Alex what if we receive url like http://example.com/file, without extension at the end how can we determine file type? – Roxy'Pro Jan 07 '21 at 13:20
  • @Roxy'Pro use magic numbers https://en.wikipedia.org/wiki/Magic_number_(programming)#Magic_numbers_in_files – Alex from Jitbit Mar 22 '21 at 20:26
5

I know that this is an old question, but can be helpful to people that see this question.

The best approach for getting an extension from filename inside an URL, also with parameters are with regex.

You can use this pattern (not urls only):

.+(\.\w{3})\?*.*

Explanation:

.+     Match any character between one and infinite
(...)  With this, you create a group, after you can use for getting string inside the brackets
\.     Match the character '.'
\w     Matches any word character equal to [a-zA-Z0-9_]
\?*    Match the character '?' between zero and infinite
.*     Match any character between zero and infinite

Example:

http://example.com/file.png
http://example.com/file.png?foo=10

But if you have an URL like this:

http://example.com/asd
This take '.com' as extension.

So you can use a strong pattern for urls like this:

.+\/{2}.+\/{1}.+(\.\w+)\?*.*

Explanation:

.+        Match any character between one and infinite
\/{2}     Match two '/' characters
.+        Match any character between one and infinite
\/{1}     Match one '/' character
.+        Match any character between one and infinite
(\.\w+)  Group and match '.' character and any word character equal to [a-zA-Z0-9_] from one to infinite
\?*       Match the character '?' between zero and infinite
.*        Match any character between zero and infinite

Example:

http://example.com/file.png          (Match .png)
https://example.com/file.png?foo=10  (Match .png)
http://example.com/asd               (No match)
C:\Foo\file.png                      (No match, only urls!)

http://example.com/file.png

    http:        .+
    //           \/{2}
    example.com  .+
    /            \/{1}
    file         .+
    .png         (\.\w+)
Jaymin
  • 2,879
  • 3
  • 19
  • 35
stfno.me
  • 898
  • 7
  • 24
4

If you just want to get the .jpg part of http://example.com/file.jpg then just use Path.GetExtension as heringer suggests.

// The following evaluates to ".jpg"
Path.GetExtension("http://example.com/file.jpg")

If the download link is something like http://example.com/this_url_will_download_a_file then the filename will be contained as part of the Content-Disposition, a HTTP header that is used to suggest a filename for browsers that display a "save file" dialog. If you want to get this filename then you can use the technique suggested by Get filename without Content-Disposition to initiate the download and get the HTTP headers, but cancel the download without actually downloading any of the file

HttpWebResponse res = (HttpWebResponse)request.GetResponse();
using (Stream rstream = res.GetResponseStream())
{
    string fileName = res.Headers["Content-Disposition"] != null ?
        res.Headers["Content-Disposition"].Replace("attachment; filename=", "").Replace("\"", "") :
        res.Headers["Location"] != null ? Path.GetFileName(res.Headers["Location"]) : 
        Path.GetFileName(url).Contains('?') || Path.GetFileName(url).Contains('=') ?
        Path.GetFileName(res.ResponseUri.ToString()) : defaultFileName;
}
res.Close();
Community
  • 1
  • 1
Justin
  • 84,773
  • 49
  • 224
  • 367
4

Here is my solution:

if (Uri.TryCreate(url, UriKind.Absolute, out var uri)){
    Console.WriteLine(Path.GetExtension(uri.LocalPath));
}

First, I verify that my url is a valid url, then I get the file extension from the local path.

Cedric Arnould
  • 1,991
  • 4
  • 29
  • 49
3

Some have suggested requesting the file from the url and checking the headers. That's overkill for something so simple in my opinion so...

Heringers answer fails if parameters are present on the url, the solution is simple just Split on the query string char ?.

string url = @"http://example.com/file.jpg";
string ext = System.IO.Path.GetExtension(url.Split('?')[0]);
Sean T
  • 2,414
  • 2
  • 17
  • 23
0

VirtualPathUtility.GetExtension(yourPath) returns the file extension from the specified path, including the leading period.

sdgfsdh
  • 33,689
  • 26
  • 132
  • 245
roxl
  • 1