wc command ignore if column have multiple lines

Question

I am trying to get lines count of csv file using wc command

wc -l test.csv

But, this command giving me incorrect count since one the column have multiple rows in csv file.

test.csv format:

 column1 column2 column3
 hi      hello   hi
                 hello
 
 I am    busy    right
                 now

for above lines , wc giving me count 4 which actually have 2 rows. can we ignore if column have multiple rows using wc command? I have googled a lot on this but none of them given me a clue.

Text files don't have rows: they have lines. In a CSV file, a newline is used as a record separator. Though a CSV parser may allow a newline to be escaped in a way to let it be used in a field value, `wc` is not a CSV parser, and doesn't know about any such convention. — chepner, Aug 04 '21 at 11:59
`wc` counts records based on record delimiter `\n`. Please add output of `cat test.csv`. — Digvijay S, Aug 04 '21 at 12:04
if you want to count only first column try `cut -d',' -f1 < test.csv | wc -l` — Digvijay S, Aug 04 '21 at 12:05
Please do not use pictures to show your textual data. The visual representation by your tool does not tells us what's in the file. And we need to know this to correctly understand the `wc` output. Copy-paste the content of the text file and indent by 4 spaces. And take [the tour](https://stackoverflow.com/tour), maybe. — Renaud Pacalet, Aug 04 '21 at 12:18
@SanjayChintha Your CSV apparently separates its records with blank lines. So, you could try to count the blank lines and add one: `grep '^$' test.csv | wc -l`. — Renaud Pacalet, Aug 04 '21 at 13:39
no its not, a column have multiple lines and command has to consider as one row, I appreciate your help. — Sanjay Chintha, Aug 04 '21 at 13:41
@SanjayChintha Why "_no its not_"? What is wrong with the result? Can you show an example where `grep '^$' test.csv | wc -l` does not give you the number of "_rows_" minus one? — Renaud Pacalet, Aug 04 '21 at 13:43
`grep '^$' test.csv` --> Will get only empty lines. Do you mean `grep -v '^$' test.csv` @RenaudPacalet — Digvijay S, Aug 04 '21 at 14:18
@DigvijayS No, I really mean count the empty lines. If there are `N+1` records separated by empty lines, then there are `N` empty lines. — Renaud Pacalet, Aug 04 '21 at 14:21
@SanjayChintha So, your empty lines are probably not empty. They are blank. Try `grep -E '^\s*$' test.csv | wc -l`. — Renaud Pacalet, Aug 04 '21 at 14:23
@SanjayChintha And if you really want to print the number of records, not minus one, try `echo $(( $(grep -E '^\s*$' test.csv | wc -l) + 1 ))`. — Renaud Pacalet, Aug 04 '21 at 14:31
its printing 1 as output which is invalid, can you please try to read from actual csv — Sanjay Chintha, Aug 04 '21 at 14:36
I tried `echo $(( $(grep -E '^\s*$' test.csv | wc -l) + 1 ))` with an exact copy-paste of your example data and got `2`. — Renaud Pacalet, Aug 04 '21 at 14:40
assuming your `test.csv` files has just the 2 rows shown, please update the question with the output from `od -c test.csv`; this will show us exactly what's in the file (including non-printing characters); from here we should a have a better idea on how to proceed — markp-fuso, Aug 04 '21 at 16:17

Renaud Pacalet · Answer 1 · 2021-08-04T14:44:30.850

Your CSV apparently separates its records with empty lines. So, you could try to count the empty lines and add one:

echo $(( $(grep '^$' test.csv | wc -l) + 1 ))

If your record separators are not really empty lines but blank lines (lines with only blank characters), you can use:

echo $(( $(grep -E '^\s*$' test.csv | wc -l) + 1 ))

But if the record separators are any number of blank lines, or if you can also have leading and trailing blank lines, the best option is probably to use an awk special feature:

awk 'END {print NR}' RS="" test.csv

If the record separator (the RS awk variable) is the empty string, by a "special dispensation [it] indicates that records are separated by one or more blank lines". So what this awk command basically does is parse your file considering this record separator, and at the end (END pseudo-condition) print the last record number (NR).

Sanjay Chintha · Answer 2 · 2021-08-05T15:29:30.883

-2

I got the valid count by using this below command

gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' test.csv | wc -l

edited Aug 05 '21 at 15:29

answered Aug 04 '21 at 17:20

Sanjay Chintha

326
1
4
21

1

Which, on your example, gives... 6! Please edit your question such that it matches at least your own answer. Or delete it completely. As it is, there is very little chance that it is useful to others. – Renaud Pacalet Aug 05 '21 at 04:54

wc command ignore if column have multiple lines

2 Answers2