1

I am trying to interact with a SFTP server from inside R. The CURL package came highly recommended. Not RCURL but CURL.

One of the things I am trying to do is get a list of directories/files at an address. I have the code working so far:

# create a new curl handle 
han <- new_handle()

# set options for SFTP
handle_setopt(han, verbose = TRUE)

# execute the request 
result <- curl_fetch_memory(url = "{SFTP URL here}",handle = han)

# get the response data 
response <- rawToChar(result$content)

The SFTP server at this URL does not have passwords. The remote has SFTP protocol version 3

The above code almost does what I am looking for, curl_fetch_memory(url = "{SFTP URL here}",handle = han) produces a list with among other things result$content that has the the said list of directories/files but with everything as in file names, dates and permission data all in the chars.

  1. How to customize the request/handle to get the list of files in a cleaner manner? Just a plain list of files akin to ls on SFTP servers? If this is at all possible. (copies of result and response attached below.)

  2. If customizing the requests is not possible, is there a way to customize CURL objects to make them a bit more human readable?

Output for response

$url
[1] "sftp://data.cyverse.org/shared/"

$status_code
[1] 0

$type
[1] NA

$headers
raw(0)

$modified
[1] "2020-02-20 16:05:33 CST"

$times
     redirect    namelookup       connect   pretransfer starttransfer 
     0.000000      0.000029      0.000000      0.230600      0.000000 
        total 
     0.230608 

$content
  [1] 64 72 77 78 72 2d 78 72 2d 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20
 [26] 30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 20 44 65 63 20 33 31 20
 [51] 20 31 39 36 39 20 2e 0a 64 72 77 78 72 2d 78 72 2d 78 20 20 20 20 31 20 30
 [76] 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30
[101] 20 44 65 63 20 33 31 20 20 31 39 36 39 20 2e 2e 0a 64 72 77 78 72 2d 78 72
[126] 2d 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20
[151] 20 20 20 20 20 20 20 20 30 20 46 65 62 20 32 30 20 20 32 30 32 30 20 61 6c
[176] 69 67 6e 6d 65 6e 74 73 5f 61 6e 64 5f 74 72 65 65 73 0a 64 72 77 78 72 2d
[201] 78 72 2d 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20 20 20 20
[226] 20 20 20 20 20 20 20 20 20 20 30 20 46 65 62 20 32 30 20 20 32 30 32 30 20
[251] 67 65 6e 65 5f 66 61 6d 69 6c 79 5f 65 76 6f 6c 75 74 69 6f 6e 0a 64 72 77
[276] 78 72 2d 78 72 2d 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20
[301] 20 20 20 20 20 20 20 20 20 20 20 20 20 30 20 46 65 62 20 32 30 20 20 32 30
[326] 32 30 20 6d 61 70 73 5f 73 63 72 69 70 74 73 0a 64 72 77 78 72 2d 78 72 2d
[351] 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 20
[376] 20 20 20 20 20 20 20 30 20 46 65 62 20 32 30 20 20 32 30 32 30 20 74 72 61
[401] 6e 73 63 72 69 70 74 5f 61 73 73 65 6d 62 6c 69 65 73 0a 64 72 77 78 72 2d
[426] 78 72 2d 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20 20 20 20
[451] 20 20 20 20 20 20 20 20 20 20 30 20 46 65 62 20 32 30 20 20 32 30 32 30 20
[476] 77 68 6f 6c 65 5f 67 65 6e 6f 6d 65 5f 64 75 70 6c 69 63 61 74 69 6f 6e 73
[501] 0a 2d 72 77 2d 72 2d 2d 72 2d 2d 20 20 20 20 31 20 30 20 20 20 20 20 20 20
[526] 20 30 20 20 20 20 20 20 20 20 20 20 20 20 20 36 36 39 20 4f 63 74 20 31 32
[551] 20 20 32 30 31 39 20 67 65 6e 65 5f 66 61 6d 69 6c 69 65 73 5f 6f 72 74 68
[576] 6f 66 69 6e 64 65 72 2e 74 78 74 0a 2d 72 77 2d 72 2d 2d 72 2d 2d 20 20 20
[601] 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 20 20 20 20 20
[626] 31 32 37 33 20 4f 63 74 20 31 32 20 20 32 30 31 39 20 72 65 61 64 6d 65 2e
[651] 74 78 74 0a

output for result$content

'drwxr-xr-x    1 0        0               0 Dec 31  1969 .\ndrwxr-xr-x    1 0        0               0 Dec 31  1969 ..\ndrwxr-xr-x    1 0        0               0 Nov 7  2020 curated\n'
Sudoh
  • 313
  • 2
  • 11
  • Q1: if supported, I suspect this would be vendor-specific; which sftp vendor/version is being used on the remote end? Q2: please provide a sample of the output; we don't have access to your sftp server, and it's possible (even likely) that different sftp servers have varying output. – r2evans May 12 '23 at 12:43

1 Answers1

1

You can set CURLOPT_DIRLISTONLY to only list names. Though you can also parse default response as a regular tabular text, i.e. with read.table(), or readr::read_table(). Options for curl package are general libcurl options from upstream, so libcurl documentation can be used as a reference - https://curl.se/libcurl/c/easy_setopt_options.html

Using Rebex demo server as an example:

library(curl)
#> Using libcurl 7.84.0 with Schannel
# https://test.rebex.net/
SFTP_DEMO <- "sftp://demo:password@test.rebex.net:22"
han <- new_handle()

# list all libcurl options that include "list"
curl_options("list")
#>            cookielist           dirlistonly proxy_ssl_cipher_list 
#>                 10135                    48                 10259 
#>       ssl_cipher_list 
#>                 10083
# set dirlistonly
handle_setopt(han, dirlistonly = TRUE)

# dirlistonly request: 
file_list <- curl_fetch_memory(url = SFTP_DEMO, handle = han)[["content"]] |> rawToChar()

cat(file_list)
#> .
#> ..
#> pub
#> readme.txt
read.table(text = file_list)
#>           V1
#> 1          .
#> 2         ..
#> 3        pub
#> 4 readme.txt
strsplit(file_list, "\n") |> unlist()
#> [1] "."          ".."         "pub"        "readme.txt"

# you can do the same with detailed file list:
handle_setopt(han, dirlistonly = FALSE)
curl_fetch_memory(url = SFTP_DEMO,
                  handle = han)[["content"]] |>
  rawToChar() |>
  read.table(text = _)
#>           V1 V2   V3    V4  V5  V6 V7    V8         V9
#> 1 drwx------  2 demo users   0 Mar 31 17:52          .
#> 2 drwx------  2 demo users   0 Mar 31 17:52         ..
#> 3 drwx------  2 demo users   0 Mar 31 17:52        pub
#> 4 -rw-------  1 demo users 405 Dec 17  2021 readme.txt

Created on 2023-05-12 with reprex v2.0.2

margusl
  • 7,804
  • 2
  • 16
  • 20
  • Perfect! This does quite well. If I may, where can I learn about `curl_options()`? I am generally having difficult working with `CURL` (the package) because `CURL_OPTIONS()` lists all the options without actually describing what the options are. `CURL` works around `handles` and `handles` work around `curl_options()` but the documentation only lists them without saying what each one does. – Sudoh May 12 '23 at 15:17
  • 1
    As options are not specific to `curl` R package, you can use libcurl docs as reference - https://curl.se/libcurl/c/curl_easy_setopt.html & https://curl.se/libcurl/c/easy_setopt_options.html – margusl May 12 '23 at 15:43
  • 1
    Ahh! So, that's the context I am missing. Thank you! – Sudoh May 12 '23 at 15:46