-1

I have streamed data through Apache Flume and the data has been stored in a temp file in my hdfs folder at: user/*****/tweets/FlumeData.1643626732852.tmp

Now I am trying to run a mapper only job which will be pre-processing the job by way of url removal, # tag removal, @ removal, stop word removal etc.

However, the mapper only job is stopped at Running job.

Mapper job code:

 hadoop jar mr-job-jars/SentimentAnalysisPreprocessingJob.jar com.hadoop.poc.sentimentAnalysis.phase1.SentimentAnalysisPreprocessingDriver /user/*****/tweets/ FlumeData.1643626732852.tmp /output

Execution output:

2022-01-31 06:16:18,151 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2022-01-31 06:16:18,611 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2022-01-31 06:16:18,666 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/aviparna/.staging/job_1643615018627_0004
2022-01-31 06:16:18,996 INFO input.FileInputFormat: Total input files to process : 1
2022-01-31 06:16:19,108 WARN hdfs.DataStreamer: Caught exception
java.lang.InterruptedException
    at java.lang.Object.wait(Native Method)
    at java.lang.Thread.join(Thread.java:1252)
    at java.lang.Thread.join(Thread.java:1326)
    at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:986)
    at org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:640)
    at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:810)
2022-01-31 06:16:19,168 INFO mapreduce.JobSubmitter: number of splits:1
2022-01-31 06:16:19,449 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1643615018627_0004
2022-01-31 06:16:19,451 INFO mapreduce.JobSubmitter: Executing with tokens: []
2022-01-31 06:16:19,794 INFO conf.Configuration: resource-types.xml not found
2022-01-31 06:16:19,794 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-01-31 06:16:19,935 INFO impl.YarnClientImpl: Submitted application application_1643615018627_0004
2022-01-31 06:16:20,035 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1643615018627_0004/
2022-01-31 06:16:20,038 INFO mapreduce.Job: Running job: job_1643615018627_0004

What do I need to do to solve this problem? Please help. Also, for any added information required please inform me. I will try to provide them as soon as possible.

Adding screenshot of the YARN UI: Yarn UI

Daremitsu
  • 545
  • 2
  • 8
  • 24
  • Open the YARN UI and see why its stuck. Maybe its requesting / waiting on too many resources. Note: You may have better luck with the Twitter Kafka Connector, and using Spark streaming for sentiment analysis – OneCricketeer Jan 31 '22 at 18:49
  • Added screenshot of the Yarn UI. From the start it is waiting on the mapper job. – Daremitsu Feb 01 '22 at 06:22
  • `Unhealthy nodes = 1` is your problem. This results in `Total memory / vcores = 0`, so your apps are all waiting for resources, like I said... Look at the logs for the nodemanager process – OneCricketeer Feb 01 '22 at 13:50
  • 1
    Solved the immediate problem by just forcing my yarn to yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage 100.0 and yarn.nodemanager.disk-health-checker.min-healthy-disks 0 ... did the trick , for local proposes of course. It at least started the mapper job however seems to be stuck at it. – Daremitsu Feb 02 '22 at 09:37
  • If it's stuck at a shuffle or reducer stage, that should be shown in the UI as well. Otherwise, if it's literally stuck in a mapper, you'd ideally add more logging into the code and look at the application stdout from the YARN UI by clicking through the ID link – OneCricketeer Feb 02 '22 at 15:45

1 Answers1

0

Solved my problem by changing the mapreduce.framework.name from yarn to local in mapred-site.xml.

The problem seemed to be happening due to resource crunch in the machine.

Also after changing the properties, restart Hadoop services once again.

Daremitsu
  • 545
  • 2
  • 8
  • 24