2

I have a file that has the content as

2004-10-07     cva        create file ...
2003-11-11     cva        create version ...
2003-11-11     cva        create version ...
2003-11-11     cva        create branch ...

now I want to count the number of lines that start with date in this particular file. How can I do that

if I use wc -l <file.txt>
it gives me total number of lines(5 in my case whereas I want is count should be 4)

Shakiba Moshiri
  • 21,040
  • 2
  • 34
  • 44
noob_coder
  • 749
  • 4
  • 15
  • 35
  • 1
    `grep "[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}" filename | wc -l` should give number of lines for this particular format. However, it can be better with awk. – acsrujan Feb 08 '17 at 16:30

2 Answers2

0

An easy and simple way with: Perl

your file

2004-10-07     cva 
2004-10-04             
anything
2004-10-07     cva 
anything
2004-10-07     cva 
2004-10-07     cva 

you need
perl -lne ' ++$n if /^\d+-\d+-\d+/; print $n' your-file

output

1  
2  
2  
3  
3  
4  
5  

Count and only print sum
perl -lne ' ++$n if /^\d+-\d+-\d+/ ;END{ print $n}' your-file

output
5


with egrep -c count the match numbers
cat your-file | egrep -c '^[0-9]+-[0-9]+-[0-9]+'

output
5

Shakiba Moshiri
  • 21,040
  • 2
  • 34
  • 44
0

Given:

$ cat file
2004-10-07     cva        create file ...
no date
2003-11-11     cva        create version ...
no date
2003-11-11     cva        create version ...
no date
2003-11-11     cva        create branch ...

First figure out how to run a regex on each line of the file. Suppose you use sed since it is fairly standard and fast. You could also use awk, grep, bash, perl

Here is a sed solution:

$ sed -nE '/^[12][0-9]{3}-[0-9]{2}-[0-9]{2}/p' file
2004-10-07     cva        create file ...
2003-11-11     cva        create version ...
2003-11-11     cva        create version ...
2003-11-11     cva        create branch ...

Then pipe that to wc:

$ sed -nE '/^[12][0-9]{3}-[0-9]{2}-[0-9]{2}/p' file | wc -l
      4

Or, you can use the same pattern in awk and not need to use wc:

$ awk '/^[12][0-9]{3}-[0-9]{2}-[0-9]{2}/{lc++} END{ print lc }' file
4

Or if you wanted the count of each date:

$ awk '/^[12][0-9]{3}-[0-9]{2}-[0-9]{2}/{cnt[$1]++} END{ for (e in cnt) print e, cnt[e] }' file
2003-11-11 3
2004-10-07 1

Or, same pattern, with grep:

$ grep -cE '^[12][0-9]{3}-[0-9]{2}-[0-9]{2}' file
4

(Note: it is unclear if your date format is YYYY-MM-DD or YYYY-DD-MM You can make the pattern more specific if this is known. )

dawg
  • 98,345
  • 23
  • 131
  • 206
  • How can I print the number for each date individually not the total lines that start with a date – Khaled Annajar Jul 12 '23 at 15:44
  • Do you mean the each date is counted? Like so: `2003-11-11 3, 2004-10-07 1`? See edit... – dawg Jul 12 '23 at 16:03
  • Yes. I found the command that creates a csv file with the result date, count sed -n 's/\(^[^ ]*\).*/\1/p' history.txt | sort | uniq -c | awk '{print $2","$1}' > output.csv – Khaled Annajar Jul 16 '23 at 14:11