openmp for LZW data compression using c++

Question

Am trying to parallelize the compression process using OpenMP. Currently trying to parallelize it while all threads uses the same dictionary. But error seems to occur when trying to add new contents to dictionary. Any suggestions on how to to parallelize it using openmp?

Exception thrown: read access violation. this was 0xFFFFFFFFFFFFFFDF. Errors occurs while executing the code below in openmp critical.

dict[next] = dict_size++;
current = string(1, c);

void lzw_parallel(const string& input, const string& output_file) {
    // Initialize dictionary with single characters
    unordered_map<string, int> dict;
    for (int i = 0; i < 256; i++) {
        dict[string(1, i)] = i;
    }

    vector<int> compressed;
    int dict_size = 256;
    string current;
    double start_time = omp_get_wtime();

#pragma omp parallel for shared(dict, compressed, dict_size, current)
    for (int i = 0; i < input.length(); i++) {
        char c = input[i];
        string next = current + c;
        if (dict.find(next) != dict.end()) {
            current = next;
        }
        else {
#pragma omp critical
            {
                compressed.push_back(dict[current]);
                dict[next] = dict_size++;
                current = string(1, c);
            }
        }
    }
    if (!current.empty()) {
#pragma omp critical
        {
            compressed.push_back(dict[current]);
        }
    }

    // Write compressed data to file
    ofstream outfile(output_file);
    for (int i : compressed) {
        outfile << i << " ";
    }

    // Print input and compressed lengths
    cout << "Input length: " << input.length() << endl;
    cout << "Compressed length: " << compressed.size() * 12 / 8 << endl;

    // Print elapsed time
    double end_time = omp_get_wtime();
    printf("Work took %f seconds \n", end_time - start_time);
}

*LZW cannot be parallelized* using multiple threads. Your loop bogus because it do not consider data dependencies. There are algorithms that can be parallelized but not this one. The only way to parallelize LZW it to use multiple blocks with multiple dictionaries. However, this does not produce the same input/output and it not compatible with the serial algorithm. If you care about performance, please consider not using this algorithm at all (it is known to be slow). For example, LZ4/LZ0 can be used instead. — Jérôme Richard, Apr 23 '23 at 12:04
There have been many questions on SO about doing maps/dictionaries in OpenMP parallel. You can do it fairly elegantly by overloading the `+` operator on a class containing a `map`, but you can also give each thread its own dictionary, and then merge them in the end, using a critical section. I defer to @JérômeRichard that this does not give the correct sequential result. — Victor Eijkhout, Apr 23 '23 at 14:24

openmp for LZW data compression using c++

0 Answers0