1

I am currently trying to read in a csv-file from my Hadoop cluster in KNIME and keep getting this exception org.apache.hadoop.hdfs.BlockMissingException - Could not obtain block: BP-788889731-172.18.0.7-1689243803391:blk_1073741840_1016 file=/data/openbeer/breweries/breweries.csv.

My Hadoop Cluster runs locally with Docker. I successfully connected my Hadoop Cluster with KNIME. But whenever I try to read in a simple CSV File which I stored in my HDFS File System it can't seem to be able to access the file. Which is really weird cause KNIME seems to be able to see my HDFS File structure.

enter image description here

as you can see right here.

I already went through some post with similar problems and tried several solutions including some where corrupt files where included as the cause of this exception but I already checked through commands and my nodes seem to be healthy.

enter image description here

Meecrop67
  • 11
  • 1
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jul 19 '23 at 13:08
  • I can't seem to access the CSV files in my HDFS Structure through KNIME even though they are easily accessable through the PowerShell. My datanodes and namenodes seem healthy and have been able to access each other in the shell. – Meecrop67 Jul 20 '23 at 07:41

1 Answers1

-2

The BlockMissingException in KNIME occurs when the Hadoop Distributed File System (HDFS) is unable to locate one or more blocks of a file. This exception usually arises when a file has been deleted or a data block is missing from the HDFS.

To troubleshoot and resolve the BlockMissingException in KNIME, you can try the following steps:

  1. Verify HDFS integrity: Check the health and integrity of your Hadoop cluster's HDFS by running the following command on your NameNode:

hdfs fsck / -files -blocks -locations

This command will perform a file system check and report any missing or corrupt blocks. If any missing blocks are detected, proceed to the next step.

  1. Check replication status: Ensure that the replication factor set for your HDFS files is appropriate and that the number of live DataNodes is sufficient. You can use the following command to check the status of live DataNodes:

hdfs dfsadmin -report

  1. Rebalance HDFS: If the above steps reveal any inconsistencies or insufficient replicas, you can rebalance the HDFS cluster. This process redistributes the blocks across the DataNodes to achieve the desired replication factor. Run the following command to initiate the rebalance process:

hdfs balancer

  1. Restore missing files: If you have deleted any files that are required by your KNIME workflow, you will need to restore them from a backup or recreate them. Ensure that all necessary files are present in the HDFS.

  2. Update KNIME configuration: Check the configuration settings in KNIME to ensure that the HDFS cluster details are correctly specified. Verify that the NameNode address and port are accurate and accessible from the KNIME environment.

  3. Check network connectivity: Ensure that there are no network connectivity issues between the KNIME environment and the Hadoop cluster. Verify that all necessary ports are open and that there are no firewalls or network restrictions blocking communication.

By following these steps, you should be able to diagnose and resolve the BlockMissingException in KNIME when reading files from your Hadoop cluster.

  • 1
    Welcome to Stack Overflow, Ashish Tomar! Your answer appears likely to have been written (entirely or partially) by AI (e.g., ChatGPT). A heads-up that [posting AI-generated content is not allowed here](//meta.stackoverflow.com/q/421831). If you used an AI tool to assist with any answer, I would encourage you to delete it. We do hope you'll be a part of our community and contribute with *your own*, quality posts in the future. Thanks! – NotTheDr01ds Jul 19 '23 at 15:20
  • **Readers should review this answer carefully and critically, as AI-generated information often contains fundamental errors and misinformation.** If you observe quality issues and/or have reason to believe that this answer was generated by AI, please leave feedback accordingly. – NotTheDr01ds Jul 19 '23 at 15:20