1

Edited overview and Scope

This problem boils down to the following problem; given a source file, automatically place open and closing braces for optional control blocks in C/C++. These blocks are if, else, do, while, and for afaik.

Overview

I am attempting to trace and analyze various loops, statements, and the like in a massive code repository that I have not written myself. My end goal is to perform timing statistics on all loops (will be expanded to other things in the future, but out of scope for this problem) in a given source of code. These trace functions do various things, but they all follow a similar issue; being placed before and after a block of interest is executed.

In essence, I want to transform the code:

for (i = 0; i < some_condition; i++) {
  some_code = goes(here);
}

for (i = 0; i < some_condition; i++)
{
  some_code = goes(here);
}

for (i = 0; i < some_condition; i++) { some_code = goes(here); }

for (i = 0; i < some_condition; i++)
  some_code = goes(here);

for (i = 0; i < some_condition; i++)
  for (i = 0; i < some_condition; i++)
    some_code = goes(here);

to the following:

S_TRACE(); for (i = 0; i < some_condition; i++) {
  some_code = goes(here);
} E_TRACE();

S_TRACE(); for (i = 0; i < some_condition; i++)
{
  some_code = goes(here);
} E_TRACE();

S_TRACE(); for (i = 0; i < some_condition; i++) { some_code = goes(here); } E_TRACE();

S_TRACE(); for (i = 0; i < some_condition; i++) {
  some_code = goes(here); } E_TRACE();

S_TRACE(); for (i = 0; i < some_condition; i++) {
  S_TRACE(); for (i = 0; i < some_condition; i++) {
    some_code = goes(here); } E_TRACE(); } E_TRACE();

Basically, without new lines of code added, I want to insert a function before the statement begins (easy) and after the statement (which can be hard). For example, the following code is actually in the repository of code:

for( int i = 0; names[i]; i++ )
    if( !STRCMP( arg, names[i] ) )
    {
        *dst = names[i];
        return 0;
    }
return -1;

Terrible readability aside, I'd like to place braces on this type of loop, and insert my tracing functions. Arguments to the function (to account for nesting) I have omitted.

Current Implementation

My current implementation uses regex in Python, as I'm fairly comfortable and quick in this language. Relevant segments of implementation are as follows:

import re
source = []
loops = [r"^\s*(for\s*\(.*\))\s*($|{\s*$|\s*)", r"^\s*(while\s*\(.*\))\s*($|{\s*$|\s*)", r"^\s*(do)\s*({?)$"]


def analyize_line(out_file):
    lnum, lstr = source.pop(0)

    for index, loop_type in enumerate(loops):
        match = re.findall(loop_type, lstr)
        if match:
            print(lnum + 1, ":", match[0][0])

            if '{' in match[0][1]:
                out_file.write(lstr.replace(match[0][0], "S_TRACE(); {}".format(match[0][0])))
                look_ahead_place()
                return
            else:
                last_chance = lstr + source[0][1]
                last_match = re.findall(loop_type, last_chance)
                if last_match and '{' in last_match[0][1]:
                    # same as above
                    out_file.write(lstr.replace(match[0][0], "S_TRACE(); {}".format(match[0][0])))
                    lcnum, lcstr = source.pop(0)
                    out_file.write(lcstr)
                    look_ahead_place()
                else:
                    # No matching bracket, make one
                    out_file.write(lstr.replace(match[0][0], "S_TRACE(); {} {{".format(match[0][0])))
                    look_ahead_and_place_bracket()
                return
    # if we did not match, just a normal line
    out_file.write(lstr)


def look_ahead_place():
    depth = 1
    for idx, nl in enumerate(source):
        substr = ""
        for c in nl[1]:
            substr += c
            if depth > 0:
                if c == '{':
                    depth += 1
                elif c == '}':
                    depth -= 1
                    if depth == 0:
                        substr += " E_TRACE(); "
        if depth == 0:
            source[idx][1] = substr
            return
    print("Error finding closing bracket here!")
    exit()


def look_ahead_and_place_bracket():
    for idx, nl in enumerate(source):
        # Is the next line a scopable? how to handle multiline? ???
        # TODO
        return


def trace_loops():
    global source
    src_filename = "./example.c"
    src_file = open(src_filename)
    out_file = open(src_filename + ".tr", 'w')
    source = [[number, line] for number, line in enumerate(src_file.readlines())]
    while len(source) > 0:
        analyize_line(out_file)

trace_loops()

The example.c is the example provided above for demonstration purposes. I am struggling to come up with an algorithm that will handle both inline loops, loops with no matching braces, and loops that contain no braces but have multiline inners.

Any help in the development of my algorithm would be much appreciated. Let me know in the comments if there is something that needs to be addressed more.

EDIT :: Further Examples and Expected Results

Characters that are added are surrounded by < and > tokens for visibility.

Nested Brace-less:

for( int i = 0; i < h->fdec->i_plane; i++ )
    for( int y = 0; y < h->param.i_height >> !!i; y++ )
        fwrite( &h->fdec->plane[i][y*h->fdec->i_stride[i]], 1, h->param.i_width >> !!i, f );

<S_TRACE(); >for( int i = 0; i < h->fdec->i_plane; i++ )< {>
    <S_TRACE(); >for( int y = 0; y < h->param.i_height >> !!i; y++ )< {>
        fwrite( &h->fdec->plane[i][y*h->fdec->i_stride[i]], 1, h->param.i_width >> !!i, f );< } E_TRACE();>< } E_TRACE();>

Nested Mixed:

for( int i = 0; i < h->fdec->i_plane; i++ ) {
  for( int y = 0; y < h->param.i_height >> !!i; y++ )
    fwrite( &h->fdec->plane[i][y*h->fdec->i_stride[i]], 1, h->param.i_width >> !!i, ff );
}

<S_TRACE(); >for( int i = 0; i < h->fdec->i_plane; i++ ) {
  <S_TRACE(); >for( int y = 0; y < h->param.i_height >> !!i; y++ )< {>
    fwrite( &h->fdec->plane[i][y*h->fdec->i_stride[i]], 1, h->param.i_width >> !!i, ff );< } E_TRACE();>
}< E_TRACE();>

Large Multiline Nested Brace-less:

for( int i = 0; i < h->sh.i_mmco_command_count; i++ )
    for( int j = 0; h->frames.reference[j]; j++ )
        if( h->frames.reference[j]->i_poc == h->sh.mmco[i].i_poc )
            x264_frame_push_unused(
                h, 
                x264_frame_shift( &h->frames.reference[j] ) 
            );

<S_TRACE(); >for( int i = 0; i < h->sh.i_mmco_command_count; i++ )< {>
    <S_TRACE(); >for( int j = 0; h->frames.reference[j]; j++ )< {>
        if( h->frames.reference[j]->i_poc == h->sh.mmco[i].i_poc )
            x264_frame_push_unused(
                h, 
                x264_frame_shift( &h->frames.reference[j] ) 
            );< } E_TRACE();>< } E_TRACE();>

This Gross Multiliner:

for( int j = 0; 
  j < ((int) offsetof(x264_t,stat.frame.i_ssd) - (int) offsetof(x264_t,stat.frame.i_mv_bits)) / (int) sizeof(int); 
  j++ )
    ((int*)&h->stat.frame)[j] += ((int*)&t->stat.frame)[j];
for( int j = 0; j < 3; j++ )
    h->stat.frame.i_ssd[j] += t->stat.frame.i_ssd[j];
h->stat.frame.f_ssim += t->stat.frame.f_ssim;

<S_TRACE(); >for( int j = 0; 
  j < ((int) offsetof(x264_t,stat.frame.i_ssd) - (int) offsetof(x264_t,stat.frame.i_mv_bits)) / (int) sizeof(int); 
  j++ )< {>
    ((int*)&h->stat.frame)[j] += ((int*)&t->stat.frame)[j];< } E_TRACE();>
<S_TRACE(); >for( int j = 0; j < 3; j++ )< {>
    h->stat.frame.i_ssd[j] += t->stat.frame.i_ssd[j];< } E_TRACE();>
h->stat.frame.f_ssim += t->stat.frame.f_ssim;

If Statement Edgecase:

Perhaps my implementation requires an inclusion of if statements to account for this?

if( h->sh.i_type != SLICE_TYPE_I )
    for( int i_list = 0; i_list < 2; i_list++ )
        for( int i = 0; i < 32; i++ )
            h->stat.i_mb_count_ref[h->sh.i_type][i_list][i] += h->stat.frame.i_mb_count_ref[i_list][i];
Community
  • 1
  • 1
dovedevic
  • 673
  • 2
  • 14
  • 33
  • I expect the number of non-empty contiguous lines is in some cases quite large. Yes? 25+? I suggest you include in your examples one or two especially difficult blocks (together with the desired result). – Cary Swoveland Feb 10 '20 at 21:05
  • Sure thing. I'll go through the code now and select some examples with expected results. – dovedevic Feb 10 '20 at 21:06
  • Included some of the perhaps worst offenders. I'll include more if you would like. Since this post is getting quite long, I might make a file of all these and attach it into the post. – dovedevic Feb 10 '20 at 21:19
  • It appears that my algorithm should first perform a brace/bracket pass then my original implementation can go into effect... – dovedevic Feb 10 '20 at 21:30
  • The problem with links is that they have a tendency to break in time. It would helpful to provide, at the beginning, a short explanation of how the results will be used, as that might generate useful suggestions. – Cary Swoveland Feb 10 '20 at 21:35
  • Understandable. I'll keep updating and adding details here as fit. Thank you for the insight. – dovedevic Feb 10 '20 at 21:47
  • 1
    @CarySwoveland Updated the problem description to make the problem concise and to the point. Hope that helps. – dovedevic Feb 11 '20 at 00:24

2 Answers2

1

You are going down a rabbit hole. The more cases you run into the more cases you will run into until you have to write an actual parser for C++, which will require learning a whole technology toolchain.

Instead I would strongly recommend that you simplify your life by using a formatting tool like clang-format that already knows how to parse C++ to first rewrite with consistent formatting (so braces are now always there), and then you just need to worry about balanced braces.

(If this is part of a build process, you can copy code, reformat it, then analyze reformatted code.)

Note, if the code makes interesting use of templates, this might not be enough. But it will hopefully get you most of the way there.

btilly
  • 43,296
  • 3
  • 59
  • 88
  • I've looked at clang-format actually, also clang-tidy, and finally indent. All options appear to only either deal with whitespace or more simple cases. To my research knowledge, they cannot insert balanced braces automatically, but if you can prove me wrong, I'll accept this as an answer. – dovedevic Feb 11 '20 at 02:25
  • [Relevant bounty](https://stackoverflow.com/questions/58999076/can-clang-format-force-bracing-on-all-control-statement-bodies) if you do get that working however! – dovedevic Feb 11 '20 at 02:38
0

After extensive research, numerous applications, and many implementations, I've gotten just what I needed.

There is an existing solution called Uncrustify. The documentation is a bit lacking, but with some probing today the following config will do as I requested above.

$ cat .uncrustify

  # Uncrustify-0.70.1
  nl_if_brace                     = remove
  nl_brace_else                   = force
  nl_elseif_brace                 = remove
  nl_else_brace                   = remove
  nl_else_if                      = remove
  nl_before_if_closing_paren      = remove
  nl_for_brace                    = remove
  nl_while_brace                  = remove
  nl_do_brace                     = remove
  nl_brace_while                  = remove
  nl_multi_line_sparen_open       = remove
  nl_multi_line_sparen_close      = remove
  nl_after_vbrace_close           = true
  mod_full_brace_do               = force
  mod_full_brace_for              = force
  mod_full_brace_function         = force
  mod_full_brace_if               = force
  mod_full_brace_while            = force

You can run this using the command:

$ uncrustify -c /path/to/.uncrustify --no-backup example.c

For the future dwellers out there looking at similar issues:

  • clang-format is essentially a white-space only formatter.
  • clang-tidy can do, to a lesser extent, of what uncrustify can do; however requires direct integration with your compiler database or a full list of compiler options, which can be combersome.
  • indent is similar to clang-format
  • C++ Resharper does not support bracket formatting as of 2019.3, though planned for 2020.1.
  • VS Code does not support auto/forced bracket insertion

All these claims are made as of today and hopefully will be out of date soon so there are a plethera of tools for us to use and abuse :P

dovedevic
  • 673
  • 2
  • 14
  • 33