-1

gcov is a GNU toolchain utility that produces code coverage reports (see documentation) formated as follows:

    -:    0:Source:../../../edg/attribute.c
    -:    0:Graph:tmp.gcno
    -:    0:Data:tmp.gcda
    -:    0:Runs:1
    -:    0:Programs:1
    -:    1:#include <stdio.h>
    -:    2:
    -:    3:int main (void)
    1:    4:{
    1:    5:  int i, total;
    -:    6:
    1:    7:  total = 0;
    -:    8:
   11:    9:  for (i = 0; i < 10; i++)
   10:   10:    total += i;
    -:   11:
    1:   12:  if (total != 45)
#####:   13:    printf ("Failure\n");
    -:   14:  else
    1:   15:    printf ("Success\n");
    1:   16:  return 0;
    -:   17:}

I need to extract the line numbers of the lines that were executed from a bash script. $ egrep --regexp='^\s+[1-9]' example_file.c.gcov seems to return the relevant lines. An exemple of typical output would be:

    1:  978:  attr_name_map = alloc_hash_table(NO_MEMORY_REGION_NUMBER,
   79:  982:  for (k = 0; k<KNOWN_ATTR_TABLE_LENGTH; ++k) {
   78:  989:    attr_name_map_entries[k].descr = &known_attr_table[k];
   78:  990:    *ep = &attr_name_map_entries[k];
    1:  992:}  /* init_attr_name_map */
  519: 2085:      new_attr_seen = FALSE;
  519: 2103:      p_attributes = last_attribute_link(p_attributes);
  519: 2104:    } while (new_attr_seen);
  519: 2106:  return attributes;
   16: 3026:void transform_type_with_gnu_attributes(a_type_ptr        *p_type,
   16: 3041:  for (ap = attributes; ap != NULL; ap = ap->next) {
    1: 6979:void process_alias_fixup_list(void)
    1: 6984:  an_alias_fixup_ptr  entries = alias_fixup_list, entry;

I subsequently must extract the line number strings. The expected output from this example would be:

978
982
989
990
992
2085
2103
2104
2106
3026
3041
6979
6984

Could someone suggest a reliable, robust way to achieve this?


NOTE: My idea was to eliminate everything that is not placed between the first and the second instance of the character :, which I tried to do with sed without much success so far.

J. Doe
  • 77
  • 1
  • 8
  • If you're okay with bash-specific code, parameter expansion would do this faster than sed/awk/etc other external processes/programs. `example="123: 456: 789abc#\$&-+)({}\/"; example="${example#*\:}"; echo "${example%\:*}";` outputs: 456 (tested as a one-liner, needs a for loop for the list). – l3l_aze Jul 01 '20 at 19:30

1 Answers1

0

This is fairly simple to do using awk:

awk -F: '/ +[0-9]/ {gsub(/ /, "", $2); print $2}' file.gcov

That is, use : as the field separator, and for lines starting with spaces and digits, replace the spaces from the 2nd field and print the 2nd field.

But if you really want to use sed, and you want something robust, you could do this:

sed -e '/^  *[0-9][0-9]*:  *[0-9][0-9]*:/!d' -e 's/[^:]*: *//' -e 's/:.*//' file.gcov

What's happening here?

  • The first command uses a pattern to match lines starting with 1 or more spaces followed by 1 or more digits followed by a : followed by 1 or more spaces followed by 1 or more digits followed by a :. Then comes the interesting part, we invert this selection with ! and delete it with d. We effectively delete all other lines except the ones we need.

  • The second command is a simple substitution, replacing a sequence of characters that are not : followed by a : followed by zero or more spaces. The pattern is applied from the beginning of the line so no need for a starting ^, and no need to specify strictly 1-or-more-spaces, thanks to the previous command we already know that there will be at least one.

  • The last command is even simpler, replace a : and everything after it.

Some versions of sed will give you shortcuts for a more compact writing style, for example [0-9]+ instead of [0-9][0-9]*, but the example above will work with a wider variety of implementations (notably BSD).

janos
  • 120,954
  • 29
  • 226
  • 236
  • Thanks for your reply Janos, I haven't tried it as of yet but at first glance it seems good. Which do you think would perform fastest of your or mine solution? – J. Doe Aug 16 '17 at 11:55
  • @J.Doe generally the fewer processes in the pipeline, the better. So one awk/sed should be better than egrep + sed. Also, although you say you like `[:]` instead of `:`, it might create extra work for the regex parser (or not, maybe it's smart enough to transparently convert to `:` either way) – janos Aug 16 '17 at 12:08