SOLUTION: I can have huge strings, then I have to reserve memory for them. Instead of use string I use char pointer in hash table and therefore I reserve the appropriate memory for my hash table keys.
PROBLEM:
I'm sorry if the question already made, but I could not find any answer that helped me.
I've the following code:
EDIT (the main loop of the problematic function for Valgrind)
i = 0;
wordPos = 0;
for (; it != end; ++it,i++){
// I want to ignore this element on purpose
if (i == 1) continue;
bool isscript;
string tag(it->tagName());
convertToLower(tag);
if (it->isTag()==1){
if (tag=="script") isscript = true;
else isscript = false;
}
if (it->isComment()==0 && it->isTag()==0 && isscript==0){
wordlist.clear();
tokenize(it->text(),wordlist);
int ii = 0;
vector<string>::iterator it_palavras = wordlist.begin();
vector<string>::iterator it_words = wordlist.begin();
int ii = 0;
while(ii<wordlist.size()){
string word(wordlist[ii]);
convertToLower(word);
wordsPos++;
if (voc.find(word) == voc.end()){
voc[word] = countwords;
voc_inv[countwords] = words;
term_pos[countwords] = new vector<int>();
term_pos[countwords]->push_back(wordpos);
countwords++;
}else{
if (term_pos.find(voc[word]) == term_pos.end())
term_pos[voc[word]] = new vector<int>();
term_pos[voc[word]]->push_back(wordpos);
}
ii++;
}
}
The type of voc is unordered_map, but when I run valgrind in my code there is the following message:
EDIT Now I'm pasting the complete error with the flag --track-origins=yes.
EDIT 2 Now I'm pasting the complete error with the flag --—dsymutil=yes.
==21036== Use of uninitialised value of size 8
==21036== at 0x4201FF: _platform_memcmp (in /usr/lib/system/libsystem_platform.dylib)
==21036== by 0x10001F10D: std::__1::__hash_iterator<std::__1::__hash_node<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, void*>*> std::__1::__hash_table<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, std::__1::__unordered_map_hasher<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::__unordered_map_equal<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int> > >::find<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) (string:642)
==21036== by 0x10000358F: Colecao::ler_arvore_dom(tree<htmlcxx::HTML::Node, std::__1::allocator<tree_node_<htmlcxx::HTML::Node> > >, int, std::__1::unordered_map<int, std::__1::vector<int, std::__1::allocator<int> >, std::__1::hash<int>, std::__1::equal_to<int>, std::__1::allocator<std::__1::pair<int const, std::__1::vector<int, std::__1::allocator<int> > > > >&) (colecao.cpp:135)
==21036== by 0x100002A19: Colecao::ler(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) (colecao.cpp:73)
==21036== by 0x100001781: main (index.cpp:47)
==21036== Uninitialised value was created by a heap allocation
==21036== at 0x70AB: malloc (in /usr/local/Cellar/valgrind/HEAD/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==21036== by 0x7528D: operator new(unsigned long) (in /usr/lib/libc++.1.dylib)
==21036== by 0x77E12: std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__init(char const*, unsigned long) (in /usr/lib/libc++.1.dylib)
==21036== by 0x10001A0FF: std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, int> > >::__construct_node(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) (memory:1505)
==21036== by 0x10000838D: std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, int> > >::operator[](std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) (unordered_map:1209)
==21036== by 0x100003835: Colecao::ler_arvore_dom(tree<htmlcxx::HTML::Node, std::__1::allocator<tree_node_<htmlcxx::HTML::Node> > >, int, std::__1::unordered_map<int, std::__1::vector<int, std::__1::allocator<int> >, std::__1::hash<int>, std::__1::equal_to<int>, std::__1::allocator<std::__1::pair<int const, std::__1::vector<int, std::__1::allocator<int> > > > >&) (colecao.cpp:139)
==21036== by 0x100002A19: Colecao::ler(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) (colecao.cpp:73)
==21036== by 0x100001781: main (index.cpp:47)
When I run the code in a huge amount of data I get segmentation fault, and I think it is because of this Valgrind error.
I don't think I need to reserve space for string in unordered_map, then I figured out that is something in word variable constructor. When I initialize word with a static string (for instance, word("test")), Valgrind stops to complain.
I don't know how to fix this string/unordered_map/memory issue.
EDIT: GDB didn't help me. The segmentation fault is only when I use huge amount of data and then huge amount of memory. The only thing GDB give me is Segmentation Fault and memory address nothing more. Valgrind gave me a more complete message.