2

I need to filter a row of a csv file by a date.

The file is structured as so:

test121smith@example.com                                           active      01/24/11 10:04   07/23/23 16:56
test121johnson@example.com                                              active      04/07/14 15:56   04/23/21 04:02
test121doe@example.com                                               active      07/27/12 16:24   11/13/12 01:14
test121fritts@example.com                                             active      11/02/10 14:00   09/05/14 11:34
test121violet@example.com                                              active      05/19/11 18:11   03/25/15 12:22
test121brad@example.com                                              active      06/26/14 12:45   03/05/19 20:27

I was able to sort by date using:

awk '{print $1 "," $3 "," $5}' | sort -t "," -n -k 2.7,5 -k 2.8,5 -k 2.1,5 -k 2.2,5 -k 2.4,5 -k 2.5,5

This gives me the rows sorted by date.

Example:

test121smith@example.com,01/24/11,07/23/23
test121johnson@example.com,04/07/14,04/23/21
test121doe@example.com,07/27/12,11/13/12

Is there a way to filter this output by date? Say print only rows after 12/11/22, or only print rows before 12/11/22 for a given field or column?

What I tried:

grep -e '[3-9].$' -e '2[3-9]$' -e '12/[1-3]./22$' myfile.csv

As for output, this command filters the $3 third row. So the output for this sample example is:

test121smith@example.com,01/24/11,07/23/23

This worked, but only for one date in the 3rd column, and I didn't really grasp what it does, or how to change it depending on the date ranges the data is requested.

Thanks!

dj423
  • 23
  • 3

2 Answers2

1

You could use Miller, a nice CSV aware cli.

You could run in example

mlr --nidx --repifs filter 'strptime($3,"%m/%d/%y")>strptime("11/13/12","%m/%d/%y")' input.csv

to filter all the records of the third field ($3) greater than 11/13/12, to have

test121johnson@example.com active 04/07/14 15:56 04/23/21 04:02
test121brad@example.com active 06/26/14 12:45 03/05/19 20:27

Some notes:

  • --nidx --repifs to set the data format, index-numbered (toolkit style) with field sepator repeated (the space)
  • filter, the verb to apply filters to fields
  • strptime, the function to set the date format.
aborruso
  • 111
  • 3
0

As you have already sorted list, try below commands

$ cat a.txt

test121smith@example.com,01/24/11,07/23/23 

test121johnson@example.com,04/07/14,04/23/21

test121doe@example.com,07/27/12,11/13/12

Filter all lines before string matched including matched line

$ cat a.txt | sed '/04\/23\/21/q' # use escape sequence for date 04\/23\/21

test121smith@example.com,01/24/11,07/23/23

test121johnson@example.com,04/07/14,04/23/21

Filter all lines after string matched including matched line

$ cat a.txt | sed -n '/04\/23\/21/,$p' # use escape sequence for date 04\/23\/21 

test121johnson@example.com,04/07/14,04/23/21

test121doe@example.com,07/27/12,11/13/12
Esa Jokinen
  • 46,944
  • 3
  • 83
  • 129
asktyagi
  • 2,860
  • 2
  • 8
  • 25