1

I am on Mac Terminal and want to "grep" a string (which is a UNIX timestamp) out of an email header, convert that into a format the OS can work with and make that the creation date of the file. I want to do that recursively for all mails inside a folder (with multiple possible subfolders).

The structure would probably look something like this:

#!/bin/bash

for i in `ls`
do
  # Find the date field (X-Delivery-Time) inside an email header and grep the UNIX timestamp
  # convert timestamp to a format the OS can work with
  # overwrite the existing creation date with the new one
done

The mails header look like this

X-Envelope-From: <some@mail.com>
X-Envelope-To: <my@mail.com>
X-Delivery-Time: 1535436541
...

Some background: Apple Mail uses the date a file was created as the date displayed within Apple Mail. That’s why after moving mails from one server to another all mails now display the same date which makes sorting impossible.

As I am new to Terminal/Bash any help is appreciated. Thanks

r00ky
  • 35
  • 6

2 Answers2

2

On a Mac this should work, but since I have no mac I cannot test it myself. I assume your email files have the .emlx extension.

For a single directory:

for i in ./*.emlx; do
    unixTime=$(grep -m1 '^X-Delivery-Time:' "$i" | grep -Eo '[0-9]+') &&
    humanTime=$(date -r "$unixTime" +%Y%m%d%H%M.%S) &&
    touch -t "$humanTime" "$i"
done

For a whole directory tree:

fixdate() {
  unixTime=$(grep -m1 '^X-Delivery-Time:' "$1" | grep -Eo '[0-9]+') &&
  humanTime=$(date -r "$unixTime" +%Y%m%d%H%M.%S) &&
  touch -t "$humanTime" "$1"
}
export -f fixdate
find . -name '*.emlx' -exec bash -c 'fixdate "$@"' . {} \;

or, if you have bash 4 or higher installed (macOS still uses 3 by default)

shopt -s globstar
for i in ./**/*.emlx; do
    unixTime=$(grep -m1 '^X-Delivery-Time:' "$i" | grep -Eo '[0-9]+') &&
    humanTime=$(date -r "$unixTime" +%Y%m%d%H%M.%S) &&
    touch -t "$humanTime" "$i"
done
Socowi
  • 25,550
  • 3
  • 32
  • 54
  • This does not work for me, the creation date still stays the same. I tried adding `echo "$unixTime"` but no output in the terminal whatsoever. Can anyone confirm if this works or not? – r00ky Jul 28 '21 at 12:59
  • @r00ky Thank you for the feedback. I think I found the problem: I never passed the file to `grep`. Can you try again? – Socowi Jul 28 '21 at 13:06
  • @glennjackman Thank you for improving the answer. However, this script is supposed to run on macOS, where the preinstalled `date` command has no documented `-j` option, see [man date](https://ss64.com/osx/date.html). Also, there is no indication in the manual that `date -r` would change the date: *`date [-nu] [-r seconds] [+format]` and `-r Print out the date and time that is seconds from the Epoch.`*. So for now I rolled the answer back. If you have a mac and can confirm that `-j` works with the pre-installed `date`, you can add it again. – Socowi Jul 28 '21 at 13:10
  • Yes! That does work on files inside one folder. But as soon as there are subfolders I get the message `grep: folderxyz: Is a directory`. Any suggestion on how to make this work recursively on a larger folder structure with multiple subfolders? – r00ky Jul 28 '21 at 13:22
  • @r00ky Oh, I overlooked that part of your question. To recurse, use `find`. – Socowi Jul 28 '21 at 13:43
  • I tried your version using the fixdate function but it produces an error: »find: illegal option -- n« – r00ky Jul 28 '21 at 14:15
  • Hm... Maybe macOS' `find` requires a path. Can you try again? – Socowi Jul 28 '21 at 14:16
  • Perfect, that did the trick! I changed the last line to `find ./ -name '*.emlx' -exec bash -c 'fixdate "$@"' . {} \;`. Thank you so much! – r00ky Jul 28 '21 at 14:24
1

What follows assumes you are using the default macOS utilities (touch, date...) As they are completely outdated some adjustments will be needed if you use more recent versions (e.g. macports or brew). It also assumes that you are using bash.

If you have sub-folders ls is not the right tool. And anyway, the output of ls is not for computers, it is for humans. So, the first thing to do is find all email files. Guess what? The utility that does this is named find:

$ find . -type f -name '*.emlx'
foo/bar.emlx
baz.emlx
...

searches for true files (-type f) starting from the current directory (.) and which name is anything.emlx (-name '*.emlx'). Adapt to your situation. If all files are email files you can skip the -name ... part.

Next we need to loop over all these files and process each of them. This is a bit more complex than for f in ... for several reasons (large number of files, file names with spaces...) A robust way to do this is to redirect the output of a find command to a while loop:

while IFS= read -r -d '' f; do
  <process file "$f">
done < <(find . -type f -name '*.emlx' -print0)

The -print0 option of find is used to separate the file names with a null character instead of the default newline character. The < <(find...) part is a way to redirect the output of find to the input of the while loop. The while IFS= read -r -d '' f; do reads each file name produced by find, stores it in shell variable f, preserving the leading and trailing spaces if any (IFS=), the backslashes (-r) and using the null character as separator (-d '').

Now we must code the processing of each file. Let's first retrieve the delivery time, assuming it is always the second word of the last line starting with X-Delivery-Time::

awk '/^X-Delivery-Time:/ {t = $2} END {print t}' "$f"

does that. If you don't know awk already it's time to learn a bit of it. It's one of the very useful Swiss knives of text processing (sed is another). But let's improve it a bit such that it returns the first encountered delivery time instead of the last, stops as soon as it encountered it, and also checks that the timestamp is a real timestamp (digits):

awk '/^X-Delivery-Time:[[:space:]]+[[:digit:]]+$/ {print $2; exit}' "$f"

The [[:space:]]+ part of the regular expression matches 1 or more spaces, tabs,... and the [[:digit:]]+ matches 1 or more digits. ^ and $ match the beginning and the end of the line, respectively. The result can be assigned to a shell variable:

t="$(awk '/^X-Delivery-Time:[[:space:]]+[[:digit:]]+$/ {print $2; exit}' "$f")"

Note that if there was no match the t variable will store the empty string. We will use this later to skip such files.

Once we have this delivery time, which looks like a UNIX timestamp (seconds since 1970/01/01) in your example, we must use it to change the last modification time of the email file. The command that does this is touch:

$ man touch
...
touch [-A [-][[hh]mm]SS] [-acfhm] [-r file] [-t [[CC]YY]MMDDhhmm[.SS]] file ...
...

Unfortunately touch wants a time in the CCYYMMDDhhmm.SS format. No worry, the date utility can be used to convert a UNIX timestamp in any format we like. For instance, with your example timestamp (1535436541):

$ date -r 1535436541 +%Y%m%d%H%M.%S
201808280809.01

We are almost done:

while IFS= read -r -d '' f; do
  # uncomment for debugging
  # echo "processing $f"
  t="$(awk '/^X-Delivery-Time:[[:space:]]+[[:digit:]]+$/ {print $2; exit}' "$f")"
  if [ -z "$t" ]; then
    echo "no delivery time found in $f"
    continue
  fi
  # uncomment for debugging
  # echo touch -t "$(date -r "$t" +%Y%m%d%H%M.%S)" "$f"
  touch -t "$(date -r "$t" +%Y%m%d%H%M.%S)" "$f"
done < <(find . -type f -name '*.emlx' -print0)

Note how we test if t is the empty string (if [ -z "$t" ]). If it is, we print a message and jump to the next file (continue). Just put all this in a file with a shebang line and run...

If, instead of the X-Delivery-Time field, you must use a Date field with a more complex and variable format (e.g. Date: Mon, 11 Jun 2018 10:36:14 +0200), the best would be to install a decently recent version of touch with the coreutils package of Mac Ports or Homebrew. Then:

while IFS= read -r -d '' f; do
  t="$(awk '/^Date:/ {print gensub(/^Date:[[:space:]+](.*)$/,"\\1","1"); exit}' "$f")"
  if [ -z "$t" ]; then
    echo "no delivery time found in $f"
    continue
  fi
  touch -d "$t" "$f"
done < <(find . -type f -name '*.emlx' -print0)

The awk command is slightly more complex. It prints the matching line without the Date: prefix. The following sed command would do the same in a more compact form but would not really be more readable:

t="$(sed -rn 's/^Date:\s*(.*)/\1/p;Ta;q;:a' "$f")"
Renaud Pacalet
  • 25,260
  • 3
  • 34
  • 51
  • I'd use `awk '/^X-Delivery-Time:/ {print $2; exit}' "$f"` so you don't need to keep processing the email file once you've seen the date. – glenn jackman Jul 28 '21 at 12:45
  • Wow, thank you so much for the in depth explanation! Would you be so kind and put it all together in one executable script I can try out before I wrap my head around all of the details? – r00ky Jul 28 '21 at 13:07
  • @glennjackman Yes, sure. You read a preliminary ongoing version. The final one does what you suggest, plus handles no-match situations a bit better. – Renaud Pacalet Jul 28 '21 at 13:08
  • @r00ky The final part is already an executable script. Put it in a file, add the shebang line and you're done. – Renaud Pacalet Jul 28 '21 at 13:20
  • As of now this does not work for me. I don’t see any changes regarding the creation date nor an output in the terminal from the echo command. – r00ky Jul 28 '21 at 13:28
  • @r00ky Time to debug. See my last edit, uncoment the `# echo` line and see what happens. By the way, did you adapt all the details to your own case (extension of the email files...)? – Renaud Pacalet Jul 28 '21 at 13:33
  • @RenaudPacalet You were right, I forgot to adapt the file extension accordingly (= .emlx). Now when I run the command (including debugging commands) it outputs all filenames like so `./somefolder/anotherone/16384.emlx`. But the creation date is still unchanged. – r00ky Jul 28 '21 at 14:00
  • If you uncomment the debugging `echo ...` commands you should, for each email file `FILE`, see a `processing FILE` message and either a `no delivery time found in FILE` message or a `touch -t ... FILE` message. Do you see that? – Renaud Pacalet Jul 28 '21 at 14:04
  • I uncommented the echo command but it does not get executed. The only output I get from the terminal is: ` ./16382.emlx ./somefolder/16539.emlx ./somefolder/anotherone/16384.emlx ./16370.emlx ` – r00ky Jul 28 '21 at 14:11
  • @r00ky You probably typed something wrong. There is no `echo` command that could print this. Did you copy-paste exactly? Can you check again the last line? It is a bit strange but every character of it is needed, at exactly the same place. Yes, there are 2 `<` signs, a space between them... – Renaud Pacalet Jul 28 '21 at 14:15
  • @RenaudPacalet my bad, it was the file extension again ... i was missing the `x` in `.emlx` in the last line. It now works! Thank you so much! – r00ky Jul 28 '21 at 14:30
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/235389/discussion-between-r00ky-and-renaud-pacalet). – r00ky Jul 28 '21 at 14:30
  • For some reason a few hundred mails didn’t have the `X-Delivery-Time` header part. So the better way would actually be to search for the `Date:` which all of them have (which I didn’t know before). The date is always RFC 5322 formatted (example: `Date: Mon, 11 Jun 2018 10:36:14 +0200`). What would be the bash regex for that, and how do I transform it into the correct time stamp again? Could you add an example to the existing explanation? – r00ky Jul 28 '21 at 16:24
  • @r00ky With RFC 5322 format, things are a bit more complex. A very simple solution would be to use a decently recent version of the `touch` utility. Can you install the coreutils package either with [Mac Ports](https://ports.macports.org/port/coreutils/details/) or [Homebrew](https://formulae.brew.sh/formula-linux/coreutils)? – Renaud Pacalet Jul 29 '21 at 05:01
  • @RenaudPacalet Yes, I just updated to coreutils v8.32 – r00ky Jul 29 '21 at 07:59
  • @r00ky Perfect. I updated my answer, see the last part. But make sure the new `touch` is the default one (`touch --version`), or use its full path is your script (`/opt/local/libexec/gnubin/touch` with Mac Ports). – Renaud Pacalet Jul 29 '21 at 08:02