0

I have a Scrapy project with Python. I pulled all data from website that I want. But I want to pull new data from website into the existing table instead of pulling the data from the beginning at each update. For example,

+---------------------------+
| ID  |  Name   |   Job     |
+---------------------------+
| 01  |  Maria  |   Doctor  |
+---------------------------+
| 02  |  Silvia |   Teacher |
+---------------------------+
| 03  |  Lora   |   Soldier |
+---------------------------+

With the new update, new data has been added to website. This data is:

+-------------------------+
| ID  | Name   | Job      |
+-------------------------+
| 04  | Blanca | Engineer |
+-------------------------+

So, when I run my code, I just want to pull the new data from website into the existing table. Not all over again.

How can I do it?

Zhiltsoff Igor
  • 1,812
  • 8
  • 24
cl0udy
  • 1
  • 1
  • This sounds like `INSERT`. – Gordon Linoff Aug 18 '20 at 11:38
  • Not actually. Yes, i used INSERT for pulling the all data. But i don't want to pull in all the data from beginning every time I run code. I just want to pull new data into my exists table when i run the code. So, INSERT isn't enough. Do you have another opinion on this subject? @GordonLinoff – cl0udy Aug 18 '20 at 23:59

1 Answers1

1

There is a way to do this is by using pipelines and mapping the data with previous one and inserting the record if it is not available already in the database. As far as the scrapy is concerned it is totally dependent on the selectors. if the selectors for the previous and the data on the site are the same then you can not differentiate the data while crawling it. The pipeline will help you to filter the records as per your requirements.

https://docs.scrapy.org/en/latest/topics/item-pipeline.html

Ikram Khan Niazi
  • 789
  • 6
  • 17
  • Thanks for your answers. I got it. But how can i code this in pipelines.py? – cl0udy Aug 18 '20 at 17:57
  • This will help you. https://stackoverflow.com/questions/43656127/scrapy-pipeline-doesnt-insert-into-mysql – Ikram Khan Niazi Aug 19 '20 at 03:16
  • Unfortunately i didn't understand it. Can you tell me which function I would use? My problem is: When I run the project, if there is a new data in website, the data should be added the exists table. Not all data. I just wanna add new data into my table. – cl0udy Aug 19 '20 at 14:26