0

I have project based on linux based embedded application. Here i have an ELF file which i want to ensure that OpenGrok Indexing using only the symbols that are part of the ELF file excluding all non relevant/ non compiled portion of the project files. Is this possible with OpenGrok indexing ? If so what is the command to generate this index. Currently i used below command to generate the index for the entire source .. java
-Djava.util.logging.config.file=/opengrok/etc/logging.properties
-jar /opengrok/dist/lib/opengrok.jar
-c /usr/local/bin/ctags
-s /opengrok/src -d /opengrok/data -H -P -S -G
-W /opengrok/etc/configuration.xml -U http://localhost:8080/source

gsat
  • 11
  • 5
  • Could you clarify what is the issue with the current index command? – Marcelo Ávila de Oliveira Apr 18 '21 at 19:32
  • I want to just cross reference only those symbols that are linked and part of the final executable ELF. Basically this will exclude all the lines of C codes that are not included due to different macros not getting enabled based on the kernel/platform configuration .. – gsat Apr 19 '21 at 02:55
  • I am looking at symbol search only on linked symbols i.e the compiled & linked ones. Right now it appear to be a full text search whether the symbol is linked are not. – gsat Apr 25 '21 at 12:54

2 Answers2

0

If you are looking to include/ignore only specific files (in this case ELF). You can use the following options:

-I (--include) - Only files matching this pattern will be examined. Pattern supports wildcards (example: -I '*.java' -I '*.c'). Option may be repeated.
  

-i (--ignore) - Ignore matching files (prefixed with 'f:' or no prefix) or directories (prefixed with 'd:'). Pattern supports wildcards (example: -i '*.so' -i d:'test*'). Option may be repeated.
  • The intention is not to ignore ELF file but use the executable/ELF file to filter only the linked symbols & ignore all other non linked/compiled symbols .. Right now symbol search in opengrok result in listing symbols even not compiled/linked – gsat Apr 28 '21 at 04:01
0

It is not clear to me what exactly is meant by non relevant/ non compiled portion of the project files or non linked/compiled symbols so I will describe how ELF file analysis works in OpenGrok and you can decide whether this works for your use case or if filing a new issue is in order.

The ELF analyzer goes through the following ELF sections:

  • .debug_str
  • .comment
  • .data
  • .data1
  • .rodata
  • .rodata1

plus all sections with sh_type equal to SHT_STRTAB. The latter contain strings separated by null byte. From the content of these sections the analyzer extracts all printable strings (using non printable characters as separator) and concatenates them with the space character. So, all the printable strings from all these sections get accumulated into single string, effectively tokenized by the inserted spaces. These tokens are then stored in the index and therefore become searchable.

With this approach the index will contain not only symbols defined within the program but also names of external symbols referenced from the program (such as functions called from dynamic libraries), plus the contents of some of the global variables (if they contain printable strings).

Also, when the ELF binary is stripped, the .symtab section is removed and the symbols names defined within the program are lost to the indexer.

Now, it would be possible to traverse the ELF sections in more intelligent manner and exclude the external references (e.g. calls to dynamic library functions) however that would thwart the original idea which was to have a way to perform security vulnerability analysis - if it was known which function has a problem, it would be possible to perform a search for all binaries that call such function and therefore have some idea of security impact. Alternatively, the extracted tokens could be split into references and definitions.

Vlad
  • 156
  • 12
  • Thanks for the info. If ELF analyzer is used to filter only on the compiled functions/variables/object then if i browse the source files in the project i also see non compiled symbols/functions not in ELF getting listed in my search for cross references. Are there any notation/mechanism by which these non compiled symbols are segregated in the listing .. – gsat May 15 '21 at 03:39
  • As explained above, currently all the strings from the relevant sections are lumped together into common field in the index so there is no way how to tell them apart. The idea for **an enhancement** is also stated. However, that would work only for unstripped binaries as far as I can tell. – Vlad May 15 '21 at 19:52