1

I am trying to figure out how to run in parallel (in this case 10) async function based on a stream of parsing datas from a website using lapwinglabs/x-ray webscraper.

let pauser = new Rx.Subject()
let count = 0
let max = 10

// function that parse a single url to retrieve data
// return Observable
let parsing_each_link = url => {
   return Rx.Observable.create(
      observer => {
         xray(url, selector)((err, data) => {
            if (err) observer.onError(err)
            observer.onNext(data)
            observer.onCompleted()
         })
    })
}
 
// retrieve all the urls from a main page => node stream
let streamNode = xray(main_url, selector)
   .paginate(some_selector)
   .write()
   .pipe(JSONStream.parse('*'))

// convert node stream to RxJS
let streamRx = RxNode.fromStream(streamNode)
   .do(() => {
      if (count === max) {
         pauser.onNext(true)
         count = 0
      }
   })
   .do(() => count++)
   .buffer(pauser) // take only 10 url by 10 url
   
streamRx.subscribe(
   ten_urls => {
      Rx.Observable.forkJoin(
         ten_urls.map(url => parsing_each_link(url))
      )
      .subscribe(
         x => console.log("Next : ", JSON.stringify(x, null, 4))
      )
   }
)

Next on the last console.log is never called ?!?

Koalabz
  • 13
  • 5

1 Answers1

1

Impossible to say for sure, but if you can make sure that ten_urls are emitted as expected, then the next step is to make sure that the observable parsing_each_link does complete, as forkJoin will wait for the last value of each of its source observables. I could not see any call to observer.onComplete in your code.

user3743222
  • 18,345
  • 5
  • 69
  • 75
  • 1
    Damned!!! You right i forget the `observer.onCompleted()` after `observer.onNext(data)`in the **parsing_each_link** function. Thank u so much. – Koalabz Nov 10 '15 at 18:24