0

i am scraping data through multiple websites. To do that i have written multiple web scrapers with using selenium and PhantomJs.

Those scrapers return values.

My question is: is there a way i can feed those values to a single python program that will sort through that data in real time.

What i want to do is not save that data to analyze it later i want to send it to a program that will analyze it in real time.

what i have tried: i have no idea where to even start

solidsnake
  • 13
  • 3

3 Answers3

0

Perhaps a named pipe would be suitable:

mkfifo whatever (you can also do this from within your python script; os.mkfifo)

You can write to whatever like a normal file (it will block until something reads it) and read from whatever with a different process (it will block if there is no data available)

Example:

# writer.py

with open('whatever', 'w') as h:
    h.write('some data') # Blocks until reader.py reads the data


# reader.py

with open('whatever', 'r') as h:
    print(h.read()) # Blocks until writer.py writes to the named pipe
OdinX
  • 4,135
  • 1
  • 24
  • 33
-1

You can try writing the data you want to share to a file and have the other script read and interpret it. Have the other script run in a loop to check if there is a new file or the file has been changed.

Trashcan
  • 24
  • 3
  • Pretty good idea, i will try with a While True: Loop and a time.stop(60). If it runs smoothly i will post the script so someone else can see it. Too bad there is no way to share data between Python Programs do. – solidsnake Aug 03 '16 at 18:34
-1

Simply use files for data exchange and a trivial locking mechanism. Each writer or reader (only one reader, it seems) gets a unique number. If a writer or reader wants to write to the file, it renames it to its original name + the number and then writes or reads, renaming it back after that. The others wait until the file is available again under its own name and then access it by locking it in a similar way.

Of course you have shared memory and such, or memmapped files and semaphores. But this mechanism has worked flawlessly for me for over 30 years, on any OS, over any network. Since it's trivially simple.

It is in fact a poor man's mutex semaphore. To find out if a file has changed, look to its writing timestamp. But the locking is necessary too, otherwise you'll land into a mess.

Jacques de Hooge
  • 6,750
  • 2
  • 28
  • 45
  • Wow, I admit it is not the most sexy solution, but it has worked on many professional projects, with software running dozens of years flawlessly. Glad that's still worth something. And glad I've developed an elephants skin. Sometimes advanced things get replaced by simple ones, like Corba by JSON. Not without reason. – Jacques de Hooge Aug 03 '16 at 19:16