3
    const QHash<QString, float> idfs = {{"the", 0.0023450551861261},
    {"of", 0.00258603321106053},
    {"to", 0.00375511856396871},
    {"and", 0.0040408455383457}

..293060 lines more

Compilation command:/usr/local/bin/mpic++ -DQT_CORE_LIB -DQT_NO_DEBUG --isystem /usr/include/x86_64-linux-gnu/qt5 -isystem /usr/include/x86_64-linux-gnu/qt5/QtCore -isystem /usr/lib/x86_64-linux-gnu/qt5/mkspecs/linux-g++-64 -Wall -Wextra -std=c++11 -O2 -fPIC -fPIC -o CMakeFiles/antiplagiarism.dir/src/idfs.cc.o -c /home/user/newanalyzer/common/src/idfs.cc

Compilation result: g++: internal compiler error: Segmentation fault (program cc1plus)

Is it ok to have huge initialization list for gcc?

gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9) 
Vishaal Shankar
  • 1,648
  • 14
  • 26
Alex
  • 1,047
  • 8
  • 21
  • 1
    i doubt that input fits into floats – deW1 Jun 08 '18 at 10:40
  • Could it be the reason for segmentation fault? Precise is not the issue here – Alex Jun 08 '18 at 10:41
  • for optimization "the" etc. should be QStringLiteral("the") – deW1 Jun 08 '18 at 10:42
  • 2
    293060 lines x 2? If this is going on the stack that's going to be a .... what's the name of this site? – UKMonkey Jun 08 '18 at 10:42
  • 1
    "Is it ok to have huge initialization list for gcc?" Apparently not. What sort of info are you looking for in an answer? – aschepler Jun 08 '18 at 10:49
  • Why is segmentation fault ? is it gcc bug? maybe there are good alternatives to do this initialization – Alex Jun 08 '18 at 10:54
  • 1
    Good alternatives are called databases – deW1 Jun 08 '18 at 11:07
  • for const data? – Alex Jun 08 '18 at 11:08
  • [Related info](https://stackoverflow.com/questions/44023855/qhash-storing-large-amount-of-data). The OP there seems to have almost 10,000,000 entries in the QHash. Are you sure the segfault is because of this initialization. When you reduce the quantity drastically, do you not see a crash and does your program run properly then ? – Vishaal Shankar Jun 08 '18 at 11:21
  • this bug is not about runtime at all, it occurs on compilation phase – Alex Jun 08 '18 at 11:24
  • For large generated source files, it is usually recommended to stick to -O1, as you can easily hit one of the corner cases of -O2. – Marc Glisse Jun 08 '18 at 15:53
  • -O1 does not help – Alex Jun 08 '18 at 18:51
  • [`[implimits]`](http://eel.is/c++draft/implimits#2.35) suggests 16384 entries in a braced init list as a minimum for the implementation-defined upper bound – Caleth Jun 12 '18 at 11:27

3 Answers3

0

Is it ok to have huge initialization list for gcc?

Yes. It's not OK to have it so big that the system you build on runs out of resources. There are no limits inherent in gcc's architecture.

But this is static data and QHash is the wrong tool for the job. You should use something like gperf with a user-supplied struct instead.

The input file to gperf would look as follows, in your case:

 %language=C++
 %struct-type
 %define class-name WordHash
 %define slot-name text
 struct Word { const char *text; double frequency; };
 %%
 the, 0.0023450551861261
 of, 0.00258603321106053
 to, 0.00375511856396871
 and, 0.0040408455383457

Using the gperf-generated code, you'd look up as follows:

double getFrequency(const char *text) {
  auto *word = WordHash::in_word_set(text, strlen(text));
  Q_ASSERT(!word || strcmp(word->text, text) == 0);
  return word ? word->frequency : -1;
}
Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
  • I`ve already tried gperf, its running with '-f 0' option for more than 1 day, which is too long. – Alex Jun 12 '18 at 08:49
0

Bug to gcc was submitted https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86118 Problem was resolved by using sorted array and binary search.

Alex
  • 1,047
  • 8
  • 21
-1

Is it ok to have huge initialization list for gcc?

No.

Instead, format the data into a JSON array of objects (or as an object with "key" : value pairs). You can probably do this quickly with some simple regex commands (or modify whatever you are using to generate the init list). QT has JSON support.

If you must has the data inside the program (instead of as a separate .json file for easy updating) then embed the file as a const char* definition.

James Poag
  • 2,320
  • 1
  • 13
  • 20