1

I have a shared library file built using the Android NDK for ARM. There is also a little bit of JNI in there. This SO file is linked against many other .a files (our own static libraries we build as dependencies) as well as a few other third party static libraries, such as boost.

I am using GCC 4.8 and utilizing STL with C++11 features.

I have done some self research on this. In particular, I came across this thread:
why my C++ output executable is so big?

That helped me figure out a few commands to run such as size:

$ size libmine.so
   text    data     bss     dec     hex filename
13017993         201972   54120 13274085         ca8be5 libmine.so

Unfortunately though, other than the commands to run, the linked SO question didn't help me much on the diagnostic side (Or perhaps maybe I am just not experienced enough with linux-style development to use the information reliably). I am not sure how to analyze the results produced in such a way that it would help me pinpoint the areas of code, specific libraries, or template functions/classes/etc that are causing any growth.

The shared library itself is 13MB, which is pretty huge. I did verify that my .SO file is "stripped", which I guess means no debug symbols. At this point I'm not sure if this is due to boost or some crazy template instantiation. How can I determine what is contributing to the massive growth of my shared library?

Community
  • 1
  • 1
void.pointer
  • 24,859
  • 31
  • 132
  • 243
  • "This SO file is linked against many other .a files (our own static libraries we build as dependencies) as well as a few other third party static libraries, such as boost.". I think I just found your problem. Don't link zillions of things statically if you can avoid it, it increases the size. – tux3 Mar 09 '15 at 14:52
  • @tux3 Shouldn't the linker (when it links the SO file) be "smart enough" to strip out symbols that aren't utilized? We have about 20 different libraries we build (our own code). Each of those produces a static library file which all are bundled up and linked into a final SO file at the end. – void.pointer Mar 09 '15 at 14:53
  • 1
    If the linker thinks that you're trying to export those symbols in your .so, they aren't going to get linked-out. I suspect that this is what is happening. If this isn't the case, then I don't really have enough info to help, perhaps you're abusing templates in crazy ways and getting massive code duplications as a result ? – tux3 Mar 09 '15 at 14:55
  • You should first find out which of the *.a files is the one (or are the ones) causing the huge increase in size. Try to remove each of them temporarily and see what difference it makes. – Christian Hackl Mar 09 '15 at 14:59
  • 1
    You should have your linker generate a "map" file. This file is a cross reference of symbols and their sizes and much more. Look up your tools documentation for instructions on how to generate a map file. – Thomas Matthews Mar 09 '15 at 15:02
  • I'm interested in this, My own project has these kind of figures: http://paste.ubuntu.com/10568828/ (after stripping: http://paste.ubuntu.com/10568841/). I postponed addressing that. But someday I will have to – sehe Mar 09 '15 at 15:18
  • 1
    Run this against your unstripped program binary: `nm --print-size --size-sort --radix=d ` – Andy Brown Mar 09 '15 at 15:38

1 Answers1

1

I have no answers. Yet. Just sharing my quick & dirty oneliners so you don't have to. Disclaimer The performance is abysmal but GoodEnough™. Perl/Python/Haskell/... should have been used pragmatism was the keyword.

You need bc installed for the summations ¹

Starting from the comment by Andy Brown, I've analyzed my own project's binaries with:

  • Define a helper function to list the names:

    function names() { nm -l -S --size-sort --radix=d -C "$@"; }
    

    Drop -l (line numbers) for more performance (not needed for me)

  • The following one-liner to show cumulative size in duplicates:

    for a in bin/*; do echo -e "$a\t$(names "$a" | \
         cut -d\   -f1-2 | sort | uniq -cd | \
         perl -ne '@a=split and print "" . (($a[0]-1) * $a[2]) . "\n"' | \
         paste -sd+ | bc)"; done
    

    This tends to show only weak symbols. Because the actual values are duplicates here, I'm not sure this actually means the size in duplicates counts to the stripped binary file-size

  • Histogram of symbol types:

    for a in bin/*; do names "$a" | awk '{print $3}'; done | sort | uniq -c | sort -rn
    
  • Total sizes in symbols by symbol type, in order of descending frequency from the histogram in the previous step:

    for a in bin/*; do names "$a" | awk '{print $3}'; done | \
        sort | uniq -c | sort -rn | \
        while read count type
        do 
            total=0
            for a in bin/*
            do 
                size=$(names "$a" | awk "\$3==\"$type\" {print \$2}" | paste -sd+ | bc)
                total=$(($total + $size))
                echo -e "$type\t$size\t$a"
            done
            echo -e "total:\t$total\ttotal bytes in $count symbols\n-------"
        done
    

Sample output on my system:

bin/tool1          208148
bin/liba.so        204463
bin/libcryptopp.so 166771
bin/tool2          211916
bin/tool3          204733
bin/testrunner     208271

46935   W
16173   V
10442   T
 1724   u
  574   d
  184   R
  158   B
   94   t
   49   r
   33   b
   13   D


W   1053961 bin/tool1
W   1030888 bin/liba.so
W   784518  bin/libcryptopp.so
W   1097729 bin/tool2
W   1031444 bin/tool3
W   1072752 bin/testrunner
total:  6071292 total bytes in 46935 symbols
-------
V   317146  bin/tool1
V   243869  bin/liba.so
V   368815  bin/libcryptopp.so
V   321841  bin/tool2
V   316629  bin/tool3
V   316947  bin/testrunner
total:  1885247 total bytes in 16173 symbols
-------
T   459075  bin/tool1
T   449020  bin/liba.so
T   610503  bin/libcryptopp.so
T   455224  bin/tool2
T   450630  bin/tool3
T   449234  bin/testrunner
total:  2873686 total bytes in 10442 symbols
-------
u   4912    bin/tool1
u   4136    bin/liba.so
u   448 bin/libcryptopp.so
u   5381    bin/tool2
u   4136    bin/tool3
u   4136    bin/testrunner
total:  23149   total bytes in 1724 symbols
-------

¹ I know it can be done with pure bash, but either it involves expanding the whole stream in an evaluation expansion or writing more nested loops. I preferred bc here.

sehe
  • 374,641
  • 47
  • 450
  • 633