2

How to use modern code analysis tools, such as SourceTrail on old-ish embedded c/c++ source code, originally for compilers such as Hi-Tech C, PIC C, IAR Workbench for a number of Microcontrollers not only limited to PIC, PIC16, and PIC18 series from Microchip.

In order to support the limited architectures of the tiny microcontrollers, the vendors of the embedded compilers have had to come up with extensions to the c/c++ language, which were (or are not yet) in the c language specifications.

This results in the microcontroller specific header files containing stuff like this:

// Register: ANSELA
extern volatile unsigned char           ANSELA              @ 0xF38;
#ifndef _LIB_BUILD
asm("ANSELA equ 0F38h");
#endif

typedef union {
    struct {
        unsigned ANSB0                  :1;
        unsigned ANSB1                  :1;
        unsigned ANSB2                  :1;
        unsigned ANSB3                  :1;
        unsigned ANSB4                  :1;
        unsigned ANSB5                  :1;
    };
} ANSELBbits_t;
extern volatile ANSELBbits_t ANSELBbits @ 0xF39;

extern volatile unsigned short long     TBLPTR              @ 0xFF6;

extern volatile __bit                   ABDEN1              @ (((unsigned) &BAUDCON1)*8) + 0;

and code files include things like this:

void interrupt high_priority InterruptVectorHigh(void) 
{
}

void interrupt low_priority InterruptVectorLow(void)
{
}

What is the easiest method to support this source with modern tools, while ensuring that the source can still be used with the original compilers?

Edit:

An answer is provided below.

fsteff
  • 543
  • 5
  • 19

1 Answers1

2

The fix below will enable c code to be understood by any compiler supporting the C18 or C2x specifications. I've not (yet) had the opportunity to test with c++, so they may not fully comply with any of the C++ specifications.

Thank you to people such as @Antti Haapala, @Clifford, and @anastaciu who answered my related questions here and here and enabled this more complete answer.

The short long type

First, the 24-bit short long type was a problem, as no equivalent exists in the c-specifications, and because the two words of the type could not be addressed with a #define. At first, I used Perl to simply modify the string short long into long of all the vendor-specific header files like this:

perl -pi -e "s/(short long)/long/g" .h

Note, for the Microchip MPLAB CX8 compiler on Windows the header files are located in the following folder and sub-folders: c:\Program Files (x86)\Microchip\xc8\v1.33\include

But then I realized that the short type is never used on its own, so I decided to simply remove the short part using a #define short. Do note that this will affect anything using short so I left both methods in this answer.

The register bit and byte addresses defined with @

@-signs were a specific problem, as they could not be redefined using #define, so perl to the rescue again, this time using two passes to address the two different syntaxes:

perl -pi -e "s/@\s*([0-9a-fA-FxX]+)/AT($1)/g" .h
perl -pi -e "s/[@] ?+([^;]*)/AT($1)/g" .h

These essentially wrap anything following a @ in AT(), allowing a normal define to operate on it.

The extra keywords

The final touch is to insert a macro header into each of the header files provided by the compiler vendor. I ended up with the following macro header:

// Hack to allow SourceTrail to be used on this source
#if defined __XC8
  #define AT(address) @ address
#else
  #define AT(address)
  #define __bit _Bool
  #define asm(assembly)
  #define interrupt
  #define short
  #define high_priority
  #define low_priority
#endif

As can be seen, anything non-standard is simply removed, except when the header files are used by the MPLAB XC8 compiler. The only exception is the __bit type, which is redefined as a _Bool type - it seems to work.

The full fix as a batch script to run on windows

As I'm running all of this on windows, Perl one-liners don't really work as on Linux, so in order to process each and every header file, I had to wrap the Perl command in a batch for-loop, which is pretty slow. To make up for it, I combined everything in a single batch called fix.cmd, which is placed in the include folder (see path above):

:: Fix to allow SourceTrail to analyze MPLAB CX8 source code.
@echo off
setlocal enabledelayedexpansion

:: Run in the folder where the script exists.
pushd "%~dp0"

echo:Fixing MPLAB global include files to be used by SourceTrail and other analysis tools.

:: Loop each directory recrusively
set DirCounter=0
set FileCounter=0
for /r %%d in (.) do (
    set /A DirCounter=DirCounter+1
    pushd %%d
    echo | set /p=Processing:
    cd
    
    for %%f in (*.h) do (
        set /A FileCounter=FileCounter+1
        set /A ModValue=FileCounter%%25
        if !ModValue!==0 ( echo | set /p=* )
        call :ProcessFile %%f
    )
    
    popd
    echo *
)
echo:Processed %FileCounter% files in %DirCounter% folders.
echo Done   
exit /b 0


:ProcessFile
:: filename is in %1
    
:: Remove short from short long. (Done with a define instead)
::  perl -pi -e "s/(short long)/long/g" %1

:: Replace the simple @ lines with AT().
    perl -pi -e "s/@\s*([0-9a-fA-FxX]+)/AT($1)/g" %1

:: Exchange @ and wrap in parenthesis for any substring starting with @ and ending with ; in each header file.
    perl -pi -e "s/[@] ?+([^;]*)/AT($1)/g" %1

:: Insert defines before first line in each header files:
    perl -pi -e "print \"// Hack to allow SourceTrail to be used on this source\n#if defined __XC8\n  #define AT(address) @ address\n#else\n  #define AT(address)\n  #define __bit _Bool\n  #define asm(assembly)\n  #define interrupt\n  #define short\n#define high_priority\n  #define low_priority\n#endif\n\n\" if $. == 1" %1

::Exit subroutine   
exit /b

To perform the modification, open an elevated prompt, cd to the include files, and execute the fix.cmd

prerequisites

Perl must be installed on the Windows computer. I use StrawberryPerl

Edit: Mostly fixed typos. Clarified that there are two options for how to deal with the short long

fsteff
  • 543
  • 5
  • 19
  • Turns out there's a reason why everyone goes on and on about sticking to standard C :) – Lundin Oct 09 '20 at 10:49
  • Absolutely. But in cases of embedded code for microcontrollers, standard C does not address solutions for the limited architectures, so hacks will have to be made. A tiny microcontroller has only a few kB of flash to store the hardcoded program, sometimes just a few hundreds of bytes of RAM, and hardcoded registers/vectors. It would be nice if the c standard evolved to also include tiny embedded systems. – fsteff Oct 09 '20 at 10:54
  • Except for inline assembly and too much freedom in implementing bit fields, everything could be translated to standard C. The MCU manufacturers are just too lazy to bring their C compiler up to standard. – Codo Oct 09 '20 at 11:02
  • @Lundin Thank you for that link and the exceptional writeup! Do you also have advice on how to access the 24-bit `short long` types (which fits 24-bit registers) in a C-standard conforming way? – fsteff Oct 09 '20 at 12:07
  • Using a `#define short /* */` is a bad idea, since this also changes definitions like `typedef signed short sint16;` or `signed short x = 0;` to a `signed ìnt` type, because the preprocessor will replace short. Maybe you should define a specific 24bit type like `typedef short long uint24;` or `typedef short long uint24_least;` and on other architectures as `typedef long uint24;` / `typedef long uint24_least;`. – kesselhaus Oct 09 '20 at 13:23
  • 1
    @fsteff Most commonly this is done with another non-standard keyword `far`. That is: `far int x;` is an integer declared in the 24 bit address space and `int* far ptr` is a pointer to an integer in the 24 bit address space. – Lundin Oct 09 '20 at 14:08
  • @kesselhaus I agree on the definition of a blank value for short is in general a bad idea, but in this particular case, it didn't conflict with any other usage of short in any of the used projects I had at hand. The alternative solution to only remove the `short` part of `short long` statements using Perl is still an option that is present in my answer. `typedef short long uint24_least;` would probably be cleaner, any I may test it sometimes later. – fsteff Oct 09 '20 at 15:22