Parse Date from Text String in a 3 column format

Question

I am given an array of lines from a text file. They look similar to this, and will always be structured like this:

            Full         Tue Aug 27 10:59:43 2019                 1
     Incremental         Tue Aug 27 11:16:41 2019                 1
     Incremental         Tue Aug 27 11:25:28 2019                 1
     Incremental         Tue Aug 27 13:37:29 2019                 1

Based on the above output, I do not believe these 3 columns qualify as fixed width... as you can see the date format can and will probably change based on the date string, as well, line one contains 4 characters in column one row one, while the same column contains 11 in row's 2 through end...

How can I parse the date from these lines, so my list is this instead:

Tue Aug 27 10:59:43 2019
Tue Aug 27 11:16:41 2019
Tue Aug 27 11:25:28 2019
Tue Aug 27 13:37:29 2019

I am sure grep or sed is probably the answer I need, I just don't know much about either.

@uprego I don't think it will be fixed length based on the date format in the text it would vary based on the month, and day we're on — Kevin, Aug 27 '19 at 18:19
I don't quite understand your comment :/ but if you want to explain further, your question is a perfect canvas. :) — 178024, Aug 27 '19 at 18:26
question editted with this: Based on the above output, I do not believe these 3 columns qualify as fixed width... as you can see the date format can and will probably change based on the date string, as well, line one contains 4 characters in column one, while the same column contains 11 in row's 2 through end... — Kevin, Aug 27 '19 at 18:33
I think I mighta got your point now. I hope someone who knows well the answers set can point to a relevant duplicate. If this gets tumbleweed just ping. — 178024, Aug 27 '19 at 18:39
I mean not _get tumbleweed_ literally as I've heard it's been killed, but if it _becomes tumbleweed_ figuratively. — 178024, Aug 27 '19 at 18:42
:) appreciate it. i think if I split each line based on a space delimiter, I maye be able to piece together a "date string" based on the positional idx values in the resultant array... — Kevin, Aug 27 '19 at 18:53
as a for instance: `_d_string=${_temp_arr[1]}" "${_temp_arr[2]}" "${_temp_arr[3]}" "${_temp_arr[4]}" "${_temp_arr[5]}` displays just that date string. however, it seems hackish... — Kevin, Aug 27 '19 at 18:56

score 1 · Answer 1 · answered Aug 27 '19 at 19:17

You can use sed and a regular expression to cut out the date of that.

Assuming your data is stored in the file input.

sed -e 's/^\s\+\S\+\s\+\(.*\S\)\s\+\S\+$/\1/g' input 
Tue Aug 27 10:59:43 2019
Tue Aug 27 11:16:41 2019
Tue Aug 27 11:25:28 2019
Tue Aug 27 13:37:29 2019

The first part s/^\s\+\S\+\s\+ matches lines that begin with one or more whitespace character(s), followed by one or more non-whitespace character(s), followed again by one or more whitespace character(s). E.g.:

'            Full         '
'     Incremental         '

Let's look at the last part now \s\+\S\+$. This will match one or more non-whitespace character(s) at the end of the line, preceded by one or more whitespace character(s). E.g.:

'                 1'

The middle part $.*\S$ is a matching group which can be referenced by \1 and is called backreference. This one matches any character starting after the first match up to one non-whitespace character before the last match.
As already mentioned, \1 is the backreference to the middle part and is printed out.

appreciated it. I will test out when I'm back on it tomorrow. thanks for your input :) — Kevin, Aug 27 '19 at 19:56

asktyagi · Accepted Answer · 2019-08-28T06:18:43.097

1

Check if awk can help.

$ cat abc.txt
            Full         Tue Aug 27 10:59:43 2019                 1
     Incremental         Tue Aug 27 11:16:41 2019                 1
     Incremental         Tue Aug 27 11:25:28 2019                 1
     Incremental         Tue Aug 27 13:37:29 2019                 1
$ cat abc.txt  | awk '{print $2" "$3" "$4" "$5" "$6}'
Tue Aug 27 10:59:43 2019
Tue Aug 27 11:16:41 2019
Tue Aug 27 11:25:28 2019
Tue Aug 27 13:37:29 2019

edited Aug 28 '19 at 06:18

answered Aug 28 '19 at 02:44

asktyagi

2,860
2
8
25

Why print the 3rd field twice? Also, you don't need `cat` and you can separate the fields with comma (output field separator defaults to space), e.g. `awk '{print $2,$3,$4,$5,$6}' abc.txt`. – Freddy Aug 28 '19 at 05:48
this consistently did the trick. thank you for the help – Kevin Sep 06 '19 at 17:56

Parse Date from Text String in a 3 column format

2 Answers2