45

I am in need of a regular expression that can remove the extension of a filename, returning only the name of the file.

Here are some examples of inputs and outputs:

myfile.png     -> myfile
myfile.png.jpg -> myfile.png

I can obviously do this manually (ie removing everything from the last dot) but I'm sure that there is a regular expression that can do this by itself.

Just for the record, I am doing this in JavaScript

Andreas Grech
  • 105,982
  • 98
  • 297
  • 360

9 Answers9

97

Just for completeness: How could this be achieved without Regular Expressions?

var input = 'myfile.png';
var output = input.substr(0, input.lastIndexOf('.')) || input;

The || input takes care of the case, where lastIndexOf() provides a -1. You see, it's still a one-liner.

Boldewyn
  • 81,211
  • 44
  • 156
  • 212
  • 3
    I liked it very much what you did with input/output! :) – Dimitris Damilos Dec 19 '12 at 13:31
  • 1
    Nice solution! I had tunnelvision on RegEx, but this works aswell – Jareish Jun 10 '13 at 08:06
  • Similarly, using [underscore.string](https://github.com/epeli/underscore.string): `var output = _s.strLeftBack(input, '.'); // 'myfile'` – Tommy Stanton Jan 22 '14 at 19:43
  • Would this be faster then with regex? – Jo Smo Dec 18 '14 at 07:42
  • 2
    Yes: http://jsperf.com/file-extension-extraction. However, it is a micro-optimization. So, if you do this only once in your code, take whatever solution you want. If you do this 1000 times every second (e.g. during scrolling) or inside a heavily used library, the gain will be noticeable. – Boldewyn Dec 18 '14 at 09:22
  • 1
    50 years in the future, when file names are Mb's long, people will look up to you like a god. I mean people, not IE users. – Jack G Mar 26 '17 at 23:56
  • Even shorter still with indexing... var output = input[0: input.lastIndexOf('.')] – Gavin Ray Nov 13 '22 at 04:40
  • Is indexing with the `[X:Y]` notation now in JS? I know it only from Python. – Boldewyn Nov 13 '22 at 20:54
50
/(.*)\.[^.]+$/

Result will be in that first capture group. However, it's probably more efficient to just find the position of the rightmost period and then take everything before it, without using regex.

Amber
  • 507,862
  • 82
  • 626
  • 550
  • 1
    Also think about things like `a.longthingrighthereattheend` or `.ext` & `file.`. Perhaps check with `/(.*)\.[^.]{1,10}$/`. Actually, just use `pathinfo()`. – Xeoncross Oct 18 '12 at 19:25
  • 3
    An almost perfect solution but, if you want to avoid the trap of filenames like .htaccess (starts with a period, but has no extension), as exemplified by Roger Pate, just substitute the first asterisk by a plus signal, as follows: (.+)\.[^.]+$ – aldemarcalazans Apr 04 '17 at 21:05
  • 1
    `"xxx.xxx.xxx".replace(/(.*)\.[^.]+$/,'')` => `""` – vsync Jun 18 '19 at 09:49
18

The regular expression to match the pattern is:

/\.[^.]*$/

It finds a period character (\.), followed by 0 or more characters that are not periods ([^.]*), followed by the end of the string ($).

console.log( 
  "aaa.bbb.ccc".replace(/\.[^.]*$/,'')
)
vsync
  • 118,978
  • 58
  • 307
  • 400
Igor
  • 26,650
  • 27
  • 89
  • 114
  • 1
    /.\w*$/.exec("myfile.png") => [".png"] – Andreas Grech Nov 30 '09 at 07:35
  • That regex is returning the extension, whereas I need to remove the extension – Andreas Grech Nov 30 '09 at 07:35
  • @AndreasGrech well, regexes don't remove things. They match things. If you use a program like SED, then to remove it you match it and replace it with empty string. Of course other option is to match all that is not the extension. – barlop Nov 08 '11 at 05:46
  • 1
    This is the only answer that worked with `.replace` for me – Dominic Apr 26 '16 at 14:32
  • It is a simple and reasonable solution, but it fails in some exceptions given by Roger Pate, e.g. • send to mrs. • version 1.2 of project A more precise version of this code would be: \.[^(\.|\s)]+$ – aldemarcalazans Apr 04 '17 at 20:53
  • Works well with 'sed', e.g. ` > echo "vacation.pictures.2020.01.01.zip.bak"|sed -e 's/\.[^.]*$//g vacation.pictures.2020.01.01.zip` ` – Torbjörn Österdahl Feb 26 '20 at 13:07
12
/^(.+)(\.[^ .]+)?$/

Test cases where this works and others fail:

  • ".htaccess" (leading period)
  • "file" (no file extension)
  • "send to mrs." (no extension, but ends in abbr.)
  • "version 1.2 of project" (no extension, yet still contains a period)

The common thread above is, of course, "malformed" file extensions. But you always have to think about those corner cases. :P

Test cases where this fails:

  • "version 1.2" (no file extension, but "appears" to have one)
  • "name.tar.gz" (if you view this as a "compound extension" and wanted it split into "name" and ".tar.gz")

How to handle these is problematic and best decided on a project-specific basis.

5
/^(.+)(\.[^ .]+)?$/

Above pattern is wrong - it will always include the extension too. It's because of how the javascript regex engine works. The (\.[^ .]+) token is optional so the engine will successfully match the entire string with (.+) http://cl.ly/image/3G1I3h3M2Q0M


Here's my tested regexp solution.

The pattern will match filenameNoExt with/without extension in the path, respecting both slash and backslash separators

var path = "c:\some.path/subfolder/file.ext"
var m = path.match(/([^:\\/]*?)(?:\.([^ :\\/.]*))?$/)
var fileName = (m === null)? "" : m[0]
var fileExt  = (m === null)? "" : m[1]

dissection of the above pattern:

([^:\\/]*?)  // match any character, except slashes and colon, 0-or-more times,
             // make the token non-greedy so that the regex engine
             // will try to match the next token (the file extension)
             // capture the file name token to subpattern \1

(?:\.        // match the '.' but don't capture it
([^ :\\/.]*) // match file extension
             // ensure that the last element of the path is matched by prohibiting slashes
             // capture the file extension token to subpattern \2
)?$          // the whole file extension is optional

http://cl.ly/image/3t3N413g3K09

http://www.gethifi.com/tools/regex

This will cover all cases that was mentioned by @RogerPate but including full paths too

Steven Pribilinskiy
  • 1,862
  • 1
  • 19
  • 21
3

another no-regex way of doing it (the "oposite" of @Rahul's version, not using pop() to remove)

It doesn't require to refer to the variable twice, so it's easier to inline

filename.split('.').slice(0,-1).join()
Daniel
  • 34,125
  • 17
  • 102
  • 150
0

This will do it as well :)

'myfile.png.jpg'.split('.').reverse().slice(1).reverse().join('.');

I'd stick to the regexp though... =P

Ro Marcus Westin
  • 28,490
  • 4
  • 18
  • 11
0
  return filename.split('.').pop();

it will make your wish come true. But not regular expression way.

Rahul
  • 1,181
  • 1
  • 11
  • 20
-2

In javascript you can call the Replace() method that will replace based on a regular expression.

This regular expression will match everything from the begining of the line to the end and remove anything after the last period including the period.

/^(.*)\..*$/

The how of implementing the replace can be found in this Stackoverflow question.

Javascript regex question

Community
  • 1
  • 1
Tinidian
  • 153
  • 4
  • Actually, as you currently have it written, it will remove anything after and including the *first* period, since you have your capture group set to be non-greedy, and your latter `.*` can match anything, including periods. – Amber Nov 30 '09 at 07:39
  • Yeah I realized that after I initially posted and then updated it. Thanks – Tinidian Nov 30 '09 at 07:42
  • @Amber What makes his capture group set to non greedy? I thought .* is always greedy. .*? would be lazy. And that pattern seems to work ^(.*)\..*$ – barlop Nov 08 '11 at 06:04