0

I am writing a fonction that converts an Excel file in JS and that then scrapes a website to get all sort of data to be displayed on a JS website. My issue is I created a page that calls this function in getStaticProps but for some reason, the page is displayed before all of the data has been scraped from the site.

The issue is with my scraping function as such which is:

EDIT:

export async function convertExcelAndScrape() {
    const fs = require('fs').promises;
    var XLSX = require('xlsx')
    const path = require('path');
    try {
        let files= await fs.readdir('../Fichiers Extraction');
        for (let f of files) {
           let filePath = '../Fichiers Extraction' + "/" + f;
            var workbook = XLSX.readFile(filePath); //lecture de la file
            var sheet_name_list = workbook.SheetNames;
            var xlData = XLSX.utils.sheet_to_json(workbook.Sheets[sheet_name_list[0]]);
            var departureIATA = ""; //initialisation du code IATA de l'aéroport de départ
            var destinationIATA = ""; //initialisation du code IATA de l'aéroport d'arrivée
            var departureICAO = ""; //initialisation du code OACI de l'aéroport de départ
            var destinationICAO = ""; //initialisation du code IATA de l'aéroport de d'arrivée
            var type = ""
            let flightList = [] //ensemble des vols, il s'agit d'une liste de liste dans laquelle chaque élément contient [departure,destination,type]
            let xlDataIATA_ICAO = getIATA_ICAO().xlData //ensemble des couples {code IATA; code OACI}
            let arrayIATA_ICAO = getIATA_ICAO().array //ensemble des codes OACI
            for (let i = 0; i < xlData.length; i++) {
                type = xlData[i].TYPE //récupération du type
                if (xlData[i].STATUS === "departures") { //en fonction de si on est sur un departures ou un arrivals, le premier aéroport nommé est celui de d'épart ou d'arrivée
                    departureIATA = xlData[i]["AIRPORT 1"] !== undefined ? xlData[i]["AIRPORT 1"] : xlData[i]["AIRPORT"] //dépendant du fichier, cela correspond soit à l'identifiant AIRPORT 1 ou AIRPORt
                    destinationIATA = xlData[i]["AIRPORT 2"]
                }
                if (xlData[i].STATUS === "arrivals") {
                    departureIATA = xlData[i]["AIRPORT 2"]
                    destinationIATA = xlData[i]["AIRPORT 1"] !== undefined ? xlData[i]["AIRPORT 1"] : xlData[i]["AIRPORT"]
                }
                departureICAO = convert_IATA_ICAO(xlDataIATA_ICAO, arrayIATA_ICAO, departureIATA) //convertit le code IATA en code OACI
                destinationICAO = convert_IATA_ICAO(xlDataIATA_ICAO, arrayIATA_ICAO, destinationIATA) //convertit le code IATA en code OACI
                flightList.push([departureICAO, destinationICAO, realType(type)]) //ajout du vol à la liste des plans de vols à récupérer

            }
            await scrapePlans("http://onlineflightplanner.org/", flightList) //récupération des plans de vols sur onlineflightplanner
        }
      } catch (e) {
        console.log(e);
      }
    //joining path of directory 
    //passsing directoryPath and callback function

    console.log('I'm done')

}

The error I'm getting is as followed

SyntaxError: Unexpected end of JSON input
    at JSON.parse (<anonymous>)
    at readFileCallback (C:\Users\loic-\Documents\3A\Mission JE\code\Test2\nextjs-blog\.next\server\pages\posts\data_scraped.js:766:22)
    at FSReqCallback.readFileAfterClose [as oncomplete] (internal/fs/read_file_context.js:63:3)
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! learn-starter@0.1.0 dev: `next dev`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the learn-starter@0.1.0 dev script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     C:\Users\loic-\AppData\Roaming\npm-cache\_logs\2020-10-25T19_13_34_974Z-debug.log

And the page in which I'm calling my function is:

export async function getStaticProps(){

  const fs = eval('require("fs")')
  const nothing =await convertExcelAndScrape()
 return { props: { nothing } }
  // By returning { props: posts }, the Blog component
  // will receive `posts` as a prop at build time
}

This code prints "I'm done" before the scrape function is actually over. Any ideas on how I could first finish to read my directory and scrape my data before returning the info?

Thanks a lot!

Loïc Dubois
  • 142
  • 9

1 Answers1

0

You are mixing promises and functions with callbacks. You should decide which one to use (nowadays probably promises) and stick to it. If a library does not support promises yet, you can always wrap a function with callbacks into a promise or use promisify.

As for your problem: Your function convertExcelAndScrape is async therefore it has to return a promise. And for your function to work, you must not return until readdir and processing of the filecontents finished. Since NodeJS v10, the fs module also provides its methods returning promises. So you can simply await fs.readdir and do the processing afterwards.

const fs = require('fs').promises;

export async function convertExcelAndScrape() {
  try {
    let files= await fs.readdir('../Fichiers Extraction');
    for (let f of files) {
      // do the processing
    }
  } catch (e) {
    console.log(e);
  }
}
derpirscher
  • 14,418
  • 3
  • 18
  • 35
  • Thanks but I still have the same issue probably coming from the xlsx.readFile that causes my function to return before the scraping is done (despite using await). I also get an error of unexpected json.parse – Loïc Dubois Oct 25 '20 at 19:14
  • @LoïcDubois please update your question accordinly with the exact code you are using right now. Because you are using for instance a variable `xDatanew` which isn't declared anywhere. – derpirscher Oct 25 '20 at 19:16
  • Furthermore the code you have posted won't run, because you have syntax errors and the opening and closing braces `{}` don't match – derpirscher Oct 25 '20 at 19:23
  • There you go, here is the complete code, sorry for the French commentary but the app is supposed to be designed for a French customer :) The idea is to read an excel file containing flight plans with a departure/destination/type of aircraft, store it in a list and then call a function that gets the flight plan on a website (the function worked fine by itself but I have trouble to insert it to my app) – Loïc Dubois Oct 25 '20 at 19:23
  • Seems like you are passing some invalid JSON into your scraping function and it's throwing an exception. As I don't know what the function does (maybe download some data from the given url, that may be incomplete too). The code you are showing here does not have any issues with async execution anymore (`xlsx` package is working synchronous) but of course, it may also be a problem with file contents ... You could add some debugging output to check, whether the exception is thrown before the scraping started or after. Might also be that `xlsx.readfile` throws an exception on a particular file – derpirscher Oct 25 '20 at 19:39
  • Seems pretty odd that my scraping function is invalid since once again, it functions well on it's own without being triggered by the opening of my page but I will try to look further in that direction. Thanks for taking the time anyways! – Loïc Dubois Oct 25 '20 at 19:43
  • Based on the shown exception, at some point in this process there is a `fs.readfile('somefile', (err, data) => { JSON.parse(data)})` is happening which is throwing an error. Probably in the file `C:\Users\loic-\Documents\3A\Mission JE\code\Test2\nextjs-blog\.next\server\pages\posts\data_scraped.js:766` – derpirscher Oct 25 '20 at 19:52
  • Looks like the file had been deleted, it's working now, that's a lot! – Loïc Dubois Oct 25 '20 at 23:03