Common Crawl : pyspark, unable to use it

Asked Jun 24 '20 at 14:05

Active Jun 24 '20 at 16:26

Viewed 294 times

As part of an internship, I must download Hadoop and Spark and test them on some data of Common Crawl. I tried to follow the steps of this page https://github.com/commoncrawl/cc-pyspark#get-sample-data (I install Spark 3.0.0 on my computer) but when I try it on my computer (I use Ubuntu) I have a lot of errors and it doesn't seem to work.
Especially, when I execute the programm "serveur_count.py" I have a lot of lines where it's written something like this: Failed to open /home/root/CommonCrawl/... and the program suddently finish with written: .MapOutputTrackerMasterEndpoint stopped. Have you any idea how to correct this? (it the first time that I use theses softwares) Sorry for my English and thank you in advance for your response

edited Jun 24 '20 at 16:26

asked Jun 24 '20 at 14:05

Fitz

Please add directly to your question what you did and post the whole error message. Are you working with a spark cluster or have you installed spark on your computer? Which version of Spark are you using? – cronoik Jun 24 '20 at 15:17
> Failed to open /home/root/... Is /home/root/ readable for the user executing the Spark job? – Sebastian Nagel Jun 26 '20 at 14:22

Common Crawl : pyspark, unable to use it

0 Answers0