-2

how i can put different data from multiple sources into HDFS using python

i already tried SQL file using pyspark(in Pycharm IDEA) and it worked.

Now i need more functions that allowed me to ingest diffrent others data into HDFS

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
USER_DY
  • 13
  • 1
  • 5

1 Answers1

0

PySpark is very versatile - it can read multiple inputs via Streaming/SQL. You'll need to be more specific about what sources you are trying to load from.

However, if you want a more accessible way to ingest lots of data, that is what was explicitly built for. If you prefer not having to write lots of code, then you may also look at , which integrates nicely within the Hadoop ecosystem.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • No matter the source, just i have to put different type of data into hdfs.The tools that I am going to use must have python code – USER_DY Aug 13 '21 at 20:06
  • Such as...? And what errors are you running into when you _try_ to load into HDFS? – OneCricketeer Aug 13 '21 at 20:07
  • (.csv .json .txt ) fiels. Also i collected tweets from twitter using tweepy but i can't find the function that allow me to send them to hdfs – USER_DY Aug 13 '21 at 22:02
  • Spark can read all those and write to hdfs. There isn't one in tweepy. That's only a Twitter client, and has no relation to Hadoop... You'd need to parallelize that to a Spark dataframe, at the very least. Also see https://stackoverflow.com/questions/47926758/python-write-to-hdfs-file – OneCricketeer Aug 14 '21 at 02:03
  • please sir i tried for csv file then i get this error : – USER_DY Aug 14 '21 at 12:18
  • Sounds like you're using a module, for example you have `import pyspark as spark`, and not a SparkSession object. I suggest you create a new post for new errors, though – OneCricketeer Aug 14 '21 at 12:23
  • OneCricketeer Hi sir, for the files(CSV,JSON,TXT,SQL) i have ingested all of them successfully thank u for your answered. But now I have to do it with GUI (i'll develop interface which does the ingestion) can i do the ingestion of those files at the same time in parallel. – USER_DY Aug 17 '21 at 23:05
  • Hi sir Please how i can read the files saved in haddop as parquet from hadoop directly – USER_DY Sep 21 '21 at 20:36
  • I already told you Spark is capable of this. Python has no native Parquet support, as far as I know – OneCricketeer Sep 22 '21 at 13:02