4

SOLUTION: I can have huge strings, then I have to reserve memory for them. Instead of use string I use char pointer in hash table and therefore I reserve the appropriate memory for my hash table keys.

PROBLEM:

I'm sorry if the question already made, but I could not find any answer that helped me.

I've the following code:

EDIT (the main loop of the problematic function for Valgrind)

i = 0;
wordPos = 0;
for (; it != end; ++it,i++){

    // I want to ignore this element on purpose
    if (i == 1) continue;

    bool isscript;
    string tag(it->tagName());

    convertToLower(tag);

     if (it->isTag()==1){
         if (tag=="script") isscript = true;
         else isscript = false;
     }


     if (it->isComment()==0 && it->isTag()==0 && isscript==0){

         wordlist.clear();

         tokenize(it->text(),wordlist);
         int ii = 0;
         vector<string>::iterator it_palavras = wordlist.begin();         

         vector<string>::iterator it_words = wordlist.begin();
         int ii = 0;
         while(ii<wordlist.size()){
            string word(wordlist[ii]);
            convertToLower(word);

            wordsPos++;

           if (voc.find(word) == voc.end()){
              voc[word] = countwords;

              voc_inv[countwords] = words;
              term_pos[countwords] = new vector<int>();
              term_pos[countwords]->push_back(wordpos);
              countwords++;
           }else{
              if (term_pos.find(voc[word]) == term_pos.end())
                        term_pos[voc[word]] = new vector<int>();
                    term_pos[voc[word]]->push_back(wordpos);
           }
           ii++;
    }
}

The type of voc is unordered_map, but when I run valgrind in my code there is the following message:

EDIT Now I'm pasting the complete error with the flag --track-origins=yes.

EDIT 2 Now I'm pasting the complete error with the flag --—dsymutil=yes.

==21036== Use of uninitialised value of size 8
==21036==    at 0x4201FF: _platform_memcmp (in /usr/lib/system/libsystem_platform.dylib)
==21036==    by 0x10001F10D:  std::__1::__hash_iterator<std::__1::__hash_node<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, void*>*> std::__1::__hash_table<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, std::__1::__unordered_map_hasher<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::__unordered_map_equal<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int> > >::find<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) (string:642)
==21036==    by 0x10000358F: Colecao::ler_arvore_dom(tree<htmlcxx::HTML::Node, std::__1::allocator<tree_node_<htmlcxx::HTML::Node> > >, int, std::__1::unordered_map<int, std::__1::vector<int, std::__1::allocator<int> >, std::__1::hash<int>, std::__1::equal_to<int>, std::__1::allocator<std::__1::pair<int const, std::__1::vector<int, std::__1::allocator<int> > > > >&) (colecao.cpp:135)
==21036==    by 0x100002A19: Colecao::ler(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) (colecao.cpp:73)
==21036==    by 0x100001781: main (index.cpp:47)
==21036==  Uninitialised value was created by a heap allocation
==21036==    at 0x70AB: malloc (in /usr/local/Cellar/valgrind/HEAD/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==21036==    by 0x7528D: operator new(unsigned long) (in /usr/lib/libc++.1.dylib)
==21036==    by 0x77E12: std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__init(char const*, unsigned long) (in /usr/lib/libc++.1.dylib)
==21036==    by 0x10001A0FF: std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, int> > >::__construct_node(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) (memory:1505)
==21036==    by 0x10000838D: std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, int> > >::operator[](std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) (unordered_map:1209)
==21036==    by 0x100003835: Colecao::ler_arvore_dom(tree<htmlcxx::HTML::Node, std::__1::allocator<tree_node_<htmlcxx::HTML::Node> > >, int, std::__1::unordered_map<int, std::__1::vector<int, std::__1::allocator<int> >, std::__1::hash<int>, std::__1::equal_to<int>, std::__1::allocator<std::__1::pair<int const, std::__1::vector<int, std::__1::allocator<int> > > > >&) (colecao.cpp:139)
==21036==    by 0x100002A19: Colecao::ler(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) (colecao.cpp:73)
==21036==    by 0x100001781: main (index.cpp:47)

When I run the code in a huge amount of data I get segmentation fault, and I think it is because of this Valgrind error.

I don't think I need to reserve space for string in unordered_map, then I figured out that is something in word variable constructor. When I initialize word with a static string (for instance, word("test")), Valgrind stops to complain.

I don't know how to fix this string/unordered_map/memory issue.

EDIT: GDB didn't help me. The segmentation fault is only when I use huge amount of data and then huge amount of memory. The only thing GDB give me is Segmentation Fault and memory address nothing more. Valgrind gave me a more complete message.

Sebastian Mach
  • 38,570
  • 8
  • 95
  • 130
Evelin Amorim
  • 1,058
  • 8
  • 12
  • 1
    Why don't you elaborate what there is in //some more code here ? The snippet you have shown at the moment is an endless loop. Why do you get the iterator to wordlist if you don't use it in the while? – o_weisman Mar 04 '14 at 07:53
  • Generally, compile your code with debug info (for g++: use -g), to get line numbers. Also, the Valgrind output below "Uninitialised value was created by a heap allocation" should show where the uninitialized value came from - could you please post that as well? – oliver Mar 04 '14 at 09:46
  • I get a iterator for wordlist that I didn't use because I'm testating if that was my source of error. In the past a I declared word as follow: word(*it_words) And iterate through worlist using it_words. But the error from Valgrind remains. – Evelin Amorim Mar 04 '14 at 12:48
  • Can you mark the lines 135 and 139 in your code? Also, what are the exact definitions of the various variables you use (voc, wordlist, voc_inv, term_pos, countwords)? While it is possible to guess that countwords is an int and voc is a std::unordered_map, for debugging it's really better to avoid such guessing if possible. – oliver Mar 06 '14 at 09:52
  • Oliver, thanks to your previous answer I don't have this message anymore (your mention about string issues in the valgrind message give me the idea that something is wrong with string class that I use every internal loop). I found out that the problem is the string capacity. I can have huge strings , than heap is not able to stores these strings. I will edit the topic to say how I resolved this issue. Thank you very much. – Evelin Amorim Mar 06 '14 at 13:12
  • 1
    Stack-Overflow Etiquette: Instead of prefixing the title of your question with "SOLVED", you should see if an answer fits your solution, otherwise you should write an answer yourself. Finally, accept the best answer (and do not prefix with "SOLVED"). Consider that some of us spent time on your question which you didn't pay. Posting the solution is therefore the minimum you can do to show thankfulness. – Sebastian Mach Mar 06 '14 at 13:24
  • Thanks for the tip phresnel! This is the first time I asked a question in Stack-Overflow, then I am still learning some things. – Evelin Amorim Mar 06 '14 at 13:32

1 Answers1

2

This might actually be a problem between Valgrind and the memcmp() implementation of your platform (Mac OS X I suppose?).

The uninitialized value in your application supposedly comes from a malloc() call in std::string constructor, the latter of which is unlikely to "create" uninitialized memory on its own. So my guess would be that malloc() allocates a bit more memory than necessary (aligned to 8 bytes maybe), and _platform_memcmp() also takes these bytes into account. System libraries often have highly-optimized implementations of such functions (memcpy, memcmp, strcpy...). As Valgrind often has trouble with these optimizations, it provides own replacement functions (in mc_replace_strmem.c).

Maybe Valgrind lacks these replacements for OS X memcmp(), or your Valgrind version is too old? Also, there might be a setup problem with your system which prevents Valgrind from detecting the memcmp() function at runtime (I'm not familiar with OS X, but maybe you need some kind of debug info for your system libraries).

So, some questions:

  • are you running the latest Valgrind version? If not, upgrade it.
  • what OS X version are you using exactly?
  • does the problem disappear if you disable optimizations when compiling your application?

If this doesn't help, you might want to ask at the Valgrind users mailing list (http://valgrind.org/support/mailing_lists.html) for this specific problem.

Btw. it's pretty difficult to analyze the Valgrind backtraces without any line numbers. See Debugging Symbols Lost When Linking? for a suggestion to get line number info in the backtraces (in short: add "--dsymutil=yes" to Valgrind command line - but check out the notes for this option in http://valgrind.org/docs/manual/manual-core.html#manual-core.erropts first).

Community
  • 1
  • 1
oliver
  • 6,204
  • 9
  • 46
  • 50
  • Thanks for the answer. My post really lacks of some information about my system: - My Valgrind version is already the latest: valgrind-3.10.0.SVN. Although after I post I found out the following bug in Valgrind for mac: https://bugs.kde.org/show_bug.cgi?id=326724 - My OS X is 10.9.2 (Maverick) - I updated my post with information produced by —dsymutil=yes (thanks for the tip!) And I still get the same info from Valgrind even without optimising compilation. – Evelin Amorim Mar 05 '14 at 19:33