how i can put different data from multiple sources into HDFS using python
i already tried SQL file using pyspark(in Pycharm IDEA) and it worked.
Now i need more functions that allowed me to ingest diffrent others data into HDFS
how i can put different data from multiple sources into HDFS using python
i already tried SQL file using pyspark(in Pycharm IDEA) and it worked.
Now i need more functions that allowed me to ingest diffrent others data into HDFS
PySpark is very versatile - it can read multiple inputs via Streaming/SQL. You'll need to be more specific about what sources you are trying to load from.
However, if you want a more accessible way to ingest lots of data, that is what apache-kafka was explicitly built for. If you prefer not having to write lots of code, then you may also look at apache-nifi, which integrates nicely within the Hadoop ecosystem.