2

I have this command which is working for me, which will find zero or one directories in the current directory which match a pattern:

find . -maxdepth 1 -type d -name 'suman-*'| head -n1

on MacOS, this will result in something like:

./suman-1479860474833

<!EDIT>

My goal is to find the most recent directory (with the most recent timestamp). The directory contents look like:

foo
bar 
baz
suman-1479860475524
suman-1479860471431
suman-1479860474233
...
etc.

</EDIT>

I have three questions,

  1. Using bash, how can I strip out the ./ characters if they exist in the result? They should always be there, but I think always removing the first two characters might be too kludgy.

  2. Using bash, instead of finding the first result with head what is the best way to find the last result? I am guessing it's tail but maybe there is a better way.

  3. Is there a way to only match against a number instead of just using 'suman-*'? Perhaps I should use -regex instead of -name?

Now, I could sort the directories by the timestamp in the directory name, or I could potentially sort them by their metadata (if that metadata is accurate and persists through version control updates etc). I am not personally sure that the directory metadata is persistent enough, so I guess would rather be more transparent and use the timestamp in the directory name. And it looks like there is no created time in Unix for directories => "In Unix creation time is not stored (only: access, modification and change)."

Alexander Mills
  • 90,741
  • 139
  • 482
  • 817
  • 2
    `find` has no guaranteed order. You should not use `head` or `tail` to try to find a specific directory. – that other guy Nov 23 '16 at 01:13
  • 1
    What's the actual goal, re "first" or "last"? – Charles Duffy Nov 23 '16 at 01:14
  • Up to some extent, the first/last issue can be solved by `sort` before `head`... – axiac Nov 23 '16 at 01:17
  • BTW -- can we safely assume GNU `find` and `sort` here? – Charles Duffy Nov 23 '16 at 01:19
  • That last edit was a not-so-subtle change in requirements that invalidated existing answers: please note this change as a comment directly in your question. – mklement0 Nov 23 '16 at 02:41
  • @CharlesDuffy sorry I should have made that more clear, I updated the question - basically I have many suman- directories in the current directory. I want to find the one that is most recent, ordering by timestamp. – Alexander Mills Nov 23 '16 at 02:42
  • @mklement0 I think Hmedia's answer has had the right idea from the beginning, although I could have been more clear – Alexander Mills Nov 23 '16 at 02:48
  • 1
    @AlexanderMills: Thanks for the edits; on macOS, you don't have access to _GNU_ `find` by default, and if only immediate subdirectories are of interest, a combination of globs and possibly `ls` is simplest. – mklement0 Nov 23 '16 at 02:50
  • @mklement0, boo hiss re: the suggestion of `ls` -- other than mangle your filenames when they contain non-printable characters, what does `ls` do here that `printf '%s\n' */` or `printf '%s\0' */` won't? – Charles Duffy Nov 23 '16 at 03:38
  • @CharlesDuffy: `-r` and `-t` for reverse / last-modified sorting, for convenience (I've added a general caveat re `ls` parsing to my answer). – mklement0 Nov 23 '16 at 03:40
  • @mklement0, ...that's fair, though it also introduces a failure case since you're counting on the full set of glob matches to fit on the command line to `ls`. (And isn't bash guaranteed to honor LC_COLLATE in ordering glob results? So if our numbers have a fixed set of digits, we should be fine off the bat). – Charles Duffy Nov 23 '16 at 03:42
  • @CharlesDuffy: Please see the update to my answer. – mklement0 Nov 23 '16 at 03:56
  • 1
    @AlexanderMills I've had a play with this on OS X and FreeBSD, as this is actually useful for an application of mine. See updated answer. Whether it's the method you employ or not, I'd be curious to know how it works. If there's anything I've overlooked. It's worked on an out-of-the-box Mac system, my development mac, and a FreeBSD 10 system. – hmedia1 Nov 23 '16 at 22:13

4 Answers4

2

Getting the most recently modified folder (OS X Compatible):

stat -f "%HT %Sm %i" -t %Y%m%d%H%M%S * | grep "^Directory" | cut -f2- -d ' ' | sort -rn | head -1 | cut -f2- -d ' ' | while read inode ; do find . -inum "$inode" | basename "$(cat -)" ; done

Result: suman-1479860475524


Getting the most recently named suman- timestamp (OS X Compatible):

find ./ -mindepth 1 -maxdepth 1 -type d -name "suman-*" -print0 | sort -zn | while IFS= read -d '' file ; do basename "$file" ; done | tail -1

Result:

  • suman-1479860475524



Most recent folder (any name):

  • stat -f "%HT %Sm %i" -t %Y%m%d%H%M%S *: Lists folders beginning with a machine readable timestamp in seconds (therefore cut can be used safely), and displays the type, anode, and timestamp (see https://www.freebsd.org/cgi/man.cgi?stat(1) ):

    • Directory 20161124051357 17658795
      Directory 20161124051358 17658796
      Directory 20161124051356 17658793
      Directory 20161124051359 17658798
      Directory 20161124051400 17658800
      Directory 20161124051401 17658802
      
  • | grep "^Directory" | cut -f2- -d ' ': Selects folders, and trims off "Directory"

  • | sort -rn: Numerical sort, newest to oldest

  • | head -1 : Only most recent

  • | cut -f2- -d ' ': Show only inode component

  • | while read inode ; do find . -inum "$inode" -print0: Find the file based on the inode (some may argue from this step on is un-necessary, but it returns full folder names in the event of an odd named folder with embedded special characters)

At this point (if we added ; done here) we would have :

  • ./suman-1479860475524

So finally

  • | basename "$(cat -)" ; done : Returns just the name of the folder:

    • suman-1479860475524




Most recent suman- folder timestamp:

  • find ./ -mindepth 1 -maxdepth 1 -type d -name "suman-*" -print0 : gets the folder names in current folder

  • | sort -zn: sort them numerically, based on timestamp in directory name, not the actual filesystem modified timestamp.

  • while IFS= read -d '' file ; do basename "$file" ; done : Strip directory location characters and slashes from the file, and output the file list as line separated

  • | tail -1: just list the most recent one.

In this example, the result happens to be the same:

  • suman-1479860475524

An example of an odd folder name where the ...while read inode ;... becomes useful:


1.

  • mkdir $'some \r\t strange \n folder  \n\n name \n  totally nuts'

2.

  • stat -f "%HT %Sm %i" -t %Y%m%d%H%M%S * | grep "^Directory" | cut -f2- -d ' ' | sort -rn | head -1 | cut -f2- -d ' ' | while read inode ; do find . -inum "$inode" -print0 | basename "$(cat -)" ; done
    

3. (Output)

  • some     strange 
     folder  
    
     name 
      totally nuts
    

The Specific Questions

Using bash, how can I strip out the ./ characters if they exist in the result? (hoping they should always be there, but I think always removing the first two characters might be too kludgy).

  • Use basename on the resultant file
  • Use find "$PWD" instead of find . (this will produce full paths)
  • Use -printf "%P" (this will only show the name part without the ./) (Note: GNU Find Required for -printf)

Using bash, instead of finding the first result with head what is the best way to find the last result (I am guessing it's tail but maybe there is a better way)

One way is to use:

  • First Result: find ... | sort -n | tail -n1
  • Last Result: find ... | sort -rn | tail -n1

(obviously "First" might actually be "Last", depending on it's meaning to you. You could basically substitute head command given this fact, as long as you keep it consistent, the -r in sort will reverse the order, hence the two pipe command sets will give you each a "First" and "Last" result)

Is there a way to only match against a number (it's a timestamp in millis) instead of just using 'suman-*'? Perhaps I should use -regex instead of -name?

You can just do -name '*1479860474833' instead of -name 'suman-*'


Alternative (Simpler) approach for GNU Find:

Three test folders for this example:

  • suman-1479860474833
  • suman-1489860474833
  • suman-1499860474833

Example 1:

Here's a strict example that mitigates some risks of crazy folder names with embedded special characters

  • find "$PWD" -mindepth 1 -maxdepth 1 -type d -print0 | sort -zn | tail -zn1
    

Gives:

  • /my/dir/suman-1499860474833
    

Example 2:

This strips the leading "./", while keeping the NULL separation, using printf and \0:

  • find "$PWD" -mindepth 1 -maxdepth 1 -type d -printf "%P \0" | sort -zn | tail -zn1
    

Gives:

  • suman-1499860474833
    

NOTE: -mindepth 1 avoids returning the parent folder.

hmedia1
  • 5,552
  • 2
  • 22
  • 27
  • BTW, consider `find ... -print0 | sort -z` to avoid breaking filenames with literal newlines. Granted, this means also needing to use NUL-aware tools on the other side, but the GNU toolchain is well-suited to that; `IFS= read -r -d ''` will do as a bash builtin. – Charles Duffy Nov 23 '16 at 01:16
  • @CharlesDuffy I thought of the `..| while IFS= read -rd '' file ; do ...` but given the specific question it's reasonable to assume OP has a convention for the naming structure. I've used the NULL flags in my example one, but it's not for every horse IMO. the -z still needs printing again if it's going to be piped into something that doesn't accept NULL separators, and using loops can be slow. – hmedia1 Nov 23 '16 at 01:23
  • 1
    *nod*. Loop performance is not as bad in ksh as with bash, but this is all very true. – Charles Duffy Nov 23 '16 at 01:26
  • @CharlesDuffy For creating a text file of a disks contents that's full of bundles, every nanosecond counts :) – hmedia1 Nov 23 '16 at 01:30
  • I don't follow this part: "You can just do -name '\*1479860474833' instead of -name 'suman-*'" The timestamp will vary, and is in fact the thing that I need to sort by – Alexander Mills Nov 23 '16 at 02:51
  • On MacOS, I get "find: -printf: unknown primary or operator" – Alexander Mills Nov 23 '16 at 02:54
  • @AlexanderMills: `printf` is _GNU_ `find`-specific, whereas macOS comes with _BSD_ `find`, which has many fewer primaries (actions). – mklement0 Nov 23 '16 at 02:57
  • 1
    "The -printf option is not in POSIX find. It is a feature of GNU find, e.g., on Linux." – Alexander Mills Nov 23 '16 at 02:57
  • 1
    @AlexanderMills: If you do need a solution that makes do with _stock_ macOS utilities, please tag your question `osx`. Unless you're specifically looking for a _portable_ solution, you should always add a platform-specific tag. (Conversely, it never hurts to state in the question body that you're looking for cross-platform compatibility.) – mklement0 Nov 23 '16 at 03:01
  • 1
    @mklement0 thanks, I tried to make it clear I am on MacOS, but definitely need a cross-platform (*nix) solution – Alexander Mills Nov 23 '16 at 03:16
1

With GNU find, if you don't want ./, you can simply avoid telling find to print it.

# with GNU find
find . -maxdepth 1 -type d -name 'suman-*' -printf '%P\n' | head -n1

The %P format string excludes the part of the filename derived from the argument to find -- in this case, the ./.

With BSD find, you don't have that option, but can postprocess once your result is in a shell variable:

# strip "./" prefix from filename variable, if and only if it exists
filename=${filename#./}

As for head or tail, ordering is not guaranteed, so you can't rely on such options to find a specific file. If you want newest, oldest, first, last, etc., then you'll need to do additional work to achieve this in a reliable way. For instance:

IFS= read -r -d '' filename \
  < <(find . -maxdepth 1 -type d -name 'suman-*' -printf '%P\0' | sort -z)

...will read the first item when the stream is sorted, and...

IFS= read -r -d '' filename \
  < <(find . -maxdepth 1 -type d -name 'suman-*' -printf '%P\0' | sort -rz)

...will sort in the opposite direction, and thus read what would otherwise be the last.


Portability Notes

Note that neither find -print0 or sort -z is POSIX-specified, but both of these are available both in GNU toolchains and on current MacOS. find -printf, by contrast, requires GNU find; this can be installed on MacOS via the macports findutils package (which installs it as gfind)

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
1

tl;dr

printf '%s\n' suman-*/ | tail -n 1 | sed 's|/$||' # ... | cut -d/ -f1 works too

Note that this answer assumes that filenames don't have embedded newlines, which, fortunately, is rarely a real-world concern.
All commands in this answer are POSIX-compliant, except where noted.

If you're only looking for immediate subdirectories in your target directory, there's no need for find - a simple glob will do:

printf '%s\n' suman-*/ | head -n 1

Note, however, that:

  • this outputs subdirectory names with a trailing /.
  • symlinks to directories are included.
  • hidden subdirectories are not included (not a concern with suman-*/, which by definition never matches hidden dirs.) - to include hidden items in general, run shopt -s dotglob first (this is a Bash extension).
  • output sorting is case-sensitive, even though the macOS default filesystem is case-insensitive - to change that, pipe the printf output to sort -f or sort -rf before further processing.

Regarding your request to find the most recently modified directory matching the pattern, combining ls -dt with a glob is the simplest option:

ls -dt suman-*/ | head -n 1 # print most recently modified suman-* subdir.

If, by contrast, the timestamps embedded in the directory names should drive the sorting (e.g., 1479860475524), reverse lexical sorting will do:

ls -dr suman-*/ | head -n 1 

Without the trailing /:

ls -dr suman-*/ | head -n 1 | sed 's|/$||' # with no path prefix, | cut -d/ -f1 works too

A slightly more cumbersome, but more robust alternative that avoids use of ls in favor of avoiding the max. command-line length when calling external utilities, as reported by getconf ARG_MAX, which could be a concern if a large number of files match the glob: Tip of the hat to Charles Duffy.

printf '%s\n' suman-*/ | tail -n 1 | sed 's|/$||'

Note: This assumes that printf is implemented as a shell builtin (as opposed to having to rely on the printf utility), which, however, is true of all major POSIX-like shells (bash, zsh, ksh, dash).

Case-insensitive alternative with sort (makes no difference in this scenario):

printf '%s\n' suman-*/ | sort -rf | head -n 1 | sed 's|/$||'

Regarding your 3 original questions:

Re 1): use sed to trim the trailing /: printf '%s\n' suman-*/ | head -n 1 | sed 's|/$||' (if stripping path prefixes is also a concern, it's easiest to cd to the path prefix first and then use a filename-only glob).

Re 2): use printf ... | tail -1 | ... to get the lexically last entry (or printf ... | sort -rf | head -n 1 | ... to get the lexically last entry irrespective of case).

Re 3): (globbing) patterns allow per-character-position digit matching with character sets such as [0-9], but you cannot apply regex-style quantifiers (duplication symbols) such as ? and + to them.


Generally, there are many subtle differences between using find and globbing / ls - caveat emptor.
Generally, parsing ls output should be avoided, but - assuming one is aware of the edge cases and limitations - sometimes it is the most convenient solution.

Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Not looking for most recently modified, really just looking for the newest directory, which will always be the one with the most recent timestamp – Alexander Mills Nov 23 '16 at 03:02
  • @AlexanderMills: Understood, so the `ls -dr` solution, which performs reverse _lexical_ sorting should do, given the fixed-width timestamps embedded in the filenames, preceded by fixed-width prefix `suman-`, right? – mklement0 Nov 23 '16 at 03:03
  • Yeah the timestamps should be fixed-width (until a couple thousand years, I guess). This worked for me: printf '%s\n' suman-*/ | sort -rf | head -1 , but I am not sure how the sort works there – Alexander Mills Nov 23 '16 at 03:04
  • @AlexanderMills: `-f` sorts case-insensitively (which shouldn't matter in your scenario), and `-r` sorts in _descending_ order; with your embedded timestamps, that means that the most recent timestamp comes _first_. – mklement0 Nov 23 '16 at 03:05
  • thanks, I am cool with the "ls" version - "ls -dr suman-\*/ | head -1", why not just "ls -dr suman-\* | head -1" (without the /) ? – Alexander Mills Nov 23 '16 at 03:08
  • Why use `ls` at all, if you're expanding the glob before it's ever started? `set -- suman-*/; first_result=$1`. Or, with bash arrays: `files=( suman-*/ ); first_file=${files[0]}` – Charles Duffy Nov 23 '16 at 03:35
  • ...the above also prevents a failure if you have more matches than will fit on argv. – Charles Duffy Nov 23 '16 at 03:41
  • @CharlesDuffy: Fair enough; note that it is the _last_ match that is needed, though. I've updated the answer accordingly. – mklement0 Nov 23 '16 at 03:47
0

Thinking that "simpler is better":

# 1: Using cut (The option "-c 3-" means "from 3rd character"
find... | cut -c 3-

# 2: You're right, tail is the command (-n1 == -1)
find... | tail -1

# 3: -name should do it:
TIME=1479860474833
find... -name "suman-$TIME"

EDIT:

As you said:

The directory contents look like: foo bar baz suman-1479860475524 suman-1479860471431 suman-1479860474233 ... etc.

Most recent

To get the most recent directory, based on name, you can do:

$ find . -maxdepth 1 -type d -name suman-\* | cut -c3- | sort -rnk1.7 | head -1
# Result is:
suman-1479860475524

And if the content only have directories with name like 'suman*' (never a file with that pattern), then I think this is easier:

$ ls -1d suman-* | sort -rnk1.7 | head -1
# Result is:
suman-1479860475524

In both cases, the sort option -r (reverse) brings the most recent first, and -k1.7 will sort numerically -n from 7th character (millisecond part of name).

Oldest

In order to get the oldest directory, based on name, just not use -r option:

$ find . -maxdepth 1 -type d -name suman-\* | cut -c3- | sort -nk1.7 | head -1
# Result is:
suman-1479860471431

$ ls -1d suman-* | sort -nk1.7 | head -1
# Result is:
suman-1479860471431

Wilfredo Pomier
  • 1,091
  • 9
  • 12
  • What if the file path returned by `find` doesn't start with `./` or contains a full path? – axiac Nov 23 '16 at 01:14
  • 1
    The OP explicitly said than his command is `find . -maxdepth 1 -type d -name 'suman-*'| head -n1`, that means that the output **always** will begin with `./` – Wilfredo Pomier Nov 23 '16 at 01:32
  • I am searching for the most recent timestamp, I don't know which timestamp I am looking for, beforehand, I am just looking for the most recent one – Alexander Mills Nov 23 '16 at 03:10
  • 1
    @AlexanderMills, ...that being the case, you might find [BashFAQ #3](http://mywiki.wooledge.org/BashFAQ/003) to be of interest. – Charles Duffy Nov 23 '16 at 03:40
  • @AlexanderMills, I edited my answer, maybe that's what you're looking for. – Wilfredo Pomier Nov 23 '16 at 04:22