0

I'm working on a CSV parsing web application, which collects data and then uses it to draw a plot graph. So far it works nicely, but unfortunately it takes some time to parse the CSV files with papaparse, even though they are only about 3MB.

So it would be nice to have some kind of progress shown, when "papa" is working. I could go for the cheap hidden div, showing "I'm working", but would prefer the use of <progress>.

Unfortunately the bar just gets updated AFTER papa has finished its work. So I tried to get into webworkers and use a worker file to calculate progress and also setting worker: true in Papa Parses configuration. Still no avail.

The used configuration (with step function) is as followed:

var papaConfig =
    {
        header: true,
        dynamicTyping: true,
        worker: true,
        step: function (row) {
            if (gotHeaders == false) {
                for (k in row.data[0]) {
                    if (k != "Time" && k != "Date" && k != " Time" && k != " ") {
                        header.push(k);
                        var obj = {};
                        obj.label = k;
                        obj.data = [];
                        flotData.push(obj);
                        gotHeaders = true;
                    }
                }
            }

            tempDate = row.data[0]["Date"];
            tempTime = row.data[0][" Time"];
            var tD = tempDate.split(".");
            var tT = tempTime.split(":");
            tT[0] = tT[0].replace(" ", "");
            dateTime = new Date(tD[2], tD[1] - 1, tD[0], tT[0], tT[1], tT[2]);

            var encoded = $.toJSON(row.data[0]);

            for (j = 0; j < header.length; j++) {
                var value = $.evalJSON(encoded)[header[j]]
                flotData[j].data.push([dateTime, value]);
            }

            w.postMessage({ state: row.meta.cursor, size: size });
        },
        complete: Done,
    }

Worker configuration on the main site:

var w = new Worker("js/workers.js");

w.onmessage = function (event) {
   $("#progBar").val(event.data);
};

and the called worker is:

onmessage = function(e) {
   var progress = e.data.state;
   var size = e.data.size;
   var newPercent = Math.round(progress / size * 100);

   postMessage(newPercent);
}

The progress bar is updated, but only after the CSV file is parsed and the site is set up with data, so the worker is called, but the answer is handled after parsing. Papa Parse seems to be called in a worker, too. Or so it seems if checking the calls in the browsers debugging tools, but still the site is unresponsive, until all data shows up.

Can anyone point me to what I have done wrong, or where to adjust the code, to get a working progress bar? I guess this would also deepen my understanding of web workers.

Tomáš Zato
  • 50,171
  • 52
  • 268
  • 778
Blind Seer
  • 492
  • 1
  • 5
  • 17

2 Answers2

1

You could use the FileReader API to read the file as text, split the string by "\n" and then count the length of the returned array. This is then your size variable for the calculation of percentage.

You can then pass the file string to Papa (you do not need to reread directly from the file) and pass the number of rows (the size variable) to your worker. (I am unfamiliar with workers and so am unsure how you do this.)

Obviously this only accurately works if there are no embedded line breaks inside the csv file (e.g. where a string is spread over several lines with line breaks) as these will count as extra rows, so you will not make it to 100%. Not a fatal error, but may look strange to the user if it always seems to finish before 100%.

Here is some sample code to give you ideas.

var size = 0;

function loadFile(){
  var files = document.getElementById("file").files; //load file from file input
  var file = files[0];
  var reader = new FileReader();
  reader.readAsText(file);
  reader.onload = function(event){
    var csv = event.target.result; //the string version of your csv.
    var csvArray = csv.split("\n");
    size = csvArray.length;
    console.log(size); //returns the number of rows in your file.
    Papa.parse(csv, papaConfig); //Send the csv string to Papa for parsing.
  };
}
apb21
  • 31
  • 5
0

I haven't used Papa Parse with workers before, but a few things pop up after playing with it for a bit:

  • It does not seem to expect you to interact directly with the worker
  • It expects you to either want the entire final result, or the individual items

Using a web worker makes providing a JS Fiddle infeasible, but here's some HTML that demonstrates the second point:

<html>
<head>
    <script src="papaparse.js"></script>
</head>

<body>
<div id="step">
</div>

<div id="result">
</div>

<script type="application/javascript">
    var papaConfig = {
        header: true,
        worker: true,
        step: function (row) {
            var stepDiv = document.getElementById('step');
            stepDiv.appendChild(document.createTextNode('Step received: ' + JSON.stringify(row)));
            stepDiv.appendChild(document.createElement('hr'));
        },
        complete: function (result) {
            var resultDiv = document.getElementById('result');
            resultDiv.appendChild(document.createElement('hr'));
            resultDiv.appendChild(document.createTextNode('Complete received: ' + JSON.stringify(result)))
            resultDiv.appendChild(document.createElement('hr'));
        }
    };

    var data = 'Column 1,Column 2,Column 3,Column 4 \n\
1-1,1-2,1-3,1-4 \n\
2-1,2-2,2-3,2-4 \n\
3-1,3-2,3-3,3-4 \n\
4,5,6,7';

    Papa.parse(data, papaConfig);
</script>
</body>

</html>

If you run this locally, you'll see you get a line for each of the four rows of the CSV data, but the call to the complete callback gets undefined. Something like:

Step received: {"data":[{"Column 1":"1-1",...
Step received: {"data":[{"Column 1":"2-1",...
Step received: {"data":[{"Column 1":"3-1",...
Step received: {"data":[{"Column 1":"4","...
Complete received: undefined

However if you remove or comment out the step function, you will get a single line for all four results:

Complete received: {"data":[{"Column 1":"1-1",...

Note also that Papa Parse uses a streaming concept to support the step callback regardless of using a worker or not. This means you won't know how many items you are parsing directly, so calculating the percent complete is not possible unless you can find the length of items separately.

Peter Wagener
  • 2,073
  • 13
  • 20
  • Yh, thats obvious, but, I've tried update the DOM during the steps. My app froze Before running papaParse, then resume later on. tipo: I used worker set to true. – Bill Somen Jul 20 '20 at 21:48