73

How can I extract the extension of a file given a file path as a character? I know I can do this via regular expression regexpr("\\.([[:alnum:]]+)$", x), but wondering if there's a built-in function to deal with this?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Suraj
  • 35,905
  • 47
  • 139
  • 250

9 Answers9

100

This is the sort of thing that easily found with R basic tools. E.g.: ??path.

Anyway, load the tools package and read ?file_ext .

Carl Witthoft
  • 20,573
  • 9
  • 43
  • 73
  • 8
    It doesn't show up with `??"extensions"` although one would have expected that it would. – IRTFM Oct 15 '11 at 16:41
  • 1
    @DWin: "patience, grasshopper" :-). I would also recommend package:sos . It's very cool. – Carl Witthoft Oct 15 '11 at 19:16
  • 4
    Witthof: Color me puzzled on two accounts; how does pkg:sos address that lack of appearance of tools::fiie_ext with ??() when a reasonable person would expect it to; and one would certainly need patience obtain value from a search strategy that delivers 20 pages with 400 hits? – IRTFM Oct 15 '11 at 19:43
  • `sos` does a full text search. `??` only searches metadata (title, keywords, etc.) Furthermore, it's not **that** hard to skim the results. (I tried `findFn("{file extension}")`, `"extract {file extension}"`, and `"{extract file extension}"`, the first was best.) – Ben Bolker Oct 15 '11 at 20:47
  • 3
    This would be more useful with an actual code sample – user5359531 Jul 11 '17 at 20:33
  • @user5359531 Did you read `?file-ext` ? one-line code sample is kind of silly – Carl Witthoft Jul 12 '17 at 11:13
  • Note well: tools::file_ext('foo.csv.gz') return 'gz', not 'csv.gz' – Zachary Ryan Smith Nov 11 '18 at 03:46
  • @ZacharyRyanSmith Well, that's because the **extension** in your example is "gz" so the code is doing the correct thing. There's no such concept as "double extensions" – Carl Witthoft Nov 12 '18 at 14:10
  • @CarlWitthoft, that's true. I was just warning people who confound path suffixes with a file's extension (people like me). – Zachary Ryan Smith Nov 12 '18 at 15:17
26

Let me extend a little bit great answer from https://stackoverflow.com/users/680068/zx8754

Here is the simple code snippet

  # 1. Load library 'tools'
  library("tools")

  # 2. Get extension for file 'test.txt'
  file_ext("test.txt")

The result should be 'txt'.

Andrii
  • 2,843
  • 27
  • 33
  • 2
    Please scroll up and read the accepted answer to this question. – Rich Scriven Nov 04 '17 at 20:57
  • 3
    Thank you, Rich! I read this comment and add this code just to show how it looks in the simple code snippet. Maybe it will be helpful for someone. – Andrii Nov 05 '17 at 06:21
  • 2
    The other comment may have been first and accepted, but it is nice to see the solution written out. The accepted answer just tells you where you find the answer. This one actually answers the question. – Dannid Aug 16 '19 at 15:24
  • 1
    Don't use `library(tools)` when you can simply use `tools::file_ext`, such as in `tools::file_ext("test.txt")`. – bers Jul 16 '21 at 11:09
11

simple function with no package to load :

getExtension <- function(file){ 
    ex <- strsplit(basename(file), split="\\.")[[1]]
    return(ex[-1])
} 
Miguel Vazq
  • 1,459
  • 2
  • 15
  • 21
  • 1
    Nice basic function! Could be one line:`getExtension <- function(file) strsplit(file, ".", fixed=T)[[1]][-1]`. To avoid regex and increase performance `fixed = TRUE` can be used. – Adrià Nov 23 '22 at 20:30
4

The regexpr above fails if the extension contains non-alnum (see e.g. https://en.wikipedia.org/wiki/List_of_filename_extensions) As an altenative one may use the following function:

getFileNameExtension <- function (fn) {
# remove a path
splitted    <- strsplit(x=fn, split='/')[[1]]   
# or use .Platform$file.sep in stead of '/'
fn          <- splitted [length(splitted)]
ext         <- ''
splitted    <- strsplit(x=fn, split='\\.')[[1]]
l           <-length (splitted)
if (l > 1 && sum(splitted[1:(l-1)] != ''))  ext <-splitted [l] 
# the extention must be the suffix of a non-empty name    
ext

}

Pisca46
  • 914
  • 8
  • 11
2

extract file extension only without dot:

tools::file_ext(fileName)

extract file extension with dot:

paste0(".", tools::file_ext(fileName))

ARAT
  • 884
  • 1
  • 14
  • 35
1

If you don't want to use any additional package you could try

file_extension <- function(filenames) {
    sub(pattern = "^(.*\\.|[^.]+)(?=[^.]*)", replacement = "", filenames, perl = TRUE)
    }

If you like to be cryptic you could try to use it as a one-line expression: sub("^(.*\\.|[^.]+)(?=[^.]*)", "", filenames, perl = TRUE) ;-)

It works for zero (!), one or more file names (as character vector or list) with an arbitrary number of dots ., and also for file names without any extension where it returns the empty character "".

Here the tests I tried:

> file_extension("simple.txt")
[1] "txt"
> file_extension(c("no extension", "simple.ext1", "with.two.ext2", "some.awkward.file.name.with.a.final.dot.", "..", ".", ""))
[1] ""     "ext1" "ext2" ""     ""     ""     ""    
> file_extension(list("file.ext1", "one.more.file.ext2"))
[1] "ext1" "ext2"
> file_extension(NULL)
character(0)
> file_extension(c())
character(0)
> file_extension(list())
character(0)

By the way, tools::file_ext() has trouble finding "strange" extensions with non-alphanumeric characters:

> tools::file_ext("file.zi_")
[1] ""
0

This function uses pipes:

library(magrittr)

file_ext <- function(f_name) {
  f_name %>%
    strsplit(".", fixed = TRUE) %>%
    unlist %>%
    extract(2)
 }

 file_ext("test.txt")
 # [1] "txt"
Enrique Pérez Herrero
  • 3,699
  • 2
  • 32
  • 33
  • 2
    Can you comment how this is an improvement over `tools::file_ext`? – Roman Luštrik Oct 09 '18 at 14:52
  • 1
    You'd better use `tools` function – Enrique Pérez Herrero Oct 09 '18 at 20:37
  • 1
    The proposed function works incorrectly if the file contains dots in the filename. The function splits the filename and outputs the second element, while it should output the last one. For the following filename 'file.name.txt' the output is 'name', not 'txt'. `tools::file_ext` works fine. – Serhii Jun 30 '21 at 11:38
0

Simplest way I've found with no additional packages:

FileExt <- function(filename) {
  nameSplit <- strsplit(x = filename, split = "\\.")[[1]]
  return(nameSplit[length(nameSplit)])
}
Ben Ernest
  • 445
  • 3
  • 14
0

A way would be to use sub.

s <- c("test.txt", "file.zi_", "noExtension", "with.two.ext2",
       "file.with.final.dot.", "..", ".", "")

sub(".*\\.|.*", "", s, perl=TRUE)
#[1] "txt"  "zi_"  ""     "ext2" ""     ""     ""     ""    

Assuming there is a dot - which will fail in case there is no extension:

sub(".*\\.", "", s)
#[1] "txt"         "zi_"         "noExtension" "ext2"        ""           
#[6] ""            ""            ""           

For comparison tools::file_ext(s) and the code with inside used regex.

tools::file_ext(s)
#[1] "txt"  ""     ""     "ext2" ""     ""     ""     ""    

pos <- regexpr("\\.([[:alnum:]]+)$", s)
ifelse(pos > -1L, substring(s, pos + 1L), "")
#[1] "txt"  ""     ""     "ext2" ""     ""     ""     ""    
GKi
  • 37,245
  • 2
  • 26
  • 48