2

I am using hadoop for writing data I scrape.

I have a spring service that is called from multiple threads to write some content to the HDFS.

@Service
public class WriteService
{
    public void write(String path, String content)
    {
        FileSystem fs = FileSystem.get(conf);
    }
}

I am not sure whether the FileSystem object can be a member of the WriteService and I don't find whether it is thread safe or not. I am using the DistributedFileSystem object.

Do you know if it is thread-safe and I can use it as a member to my service?

Thank you

user1002065
  • 595
  • 1
  • 8
  • 19

2 Answers2

1

Hadoop DFS uses a so-called WORM-Model. This makes it more robust when it comes to concurrency issues.

But, to answer the question, it is not safe in general. You still need to think about concurrency control requirements.

Ben Win
  • 830
  • 8
  • 21
  • Hi Ben, Thank you. Currently every time the write method is called I am doing FileSystem fileSystem = FileSystem.get(configuration); It seems like a bit of an heavy operation, can you please give me a hint how to do it? – user1002065 Apr 14 '15 at 14:24
  • @user1002065 With respect to the amount of writes you perform, i'd keep your solution. One thing you might should read about is the [try-with-ressource statement](https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html) – Ben Win Apr 14 '15 at 14:36
  • @user1002065 `With respect to the amount of writes`: i wanted to point out, that you shouldn't care about premature optimization :) – Ben Win Apr 14 '15 at 14:50
  • I am going to have many writes, but I care more about data be written concurrently. But I guess I will take your advice and when it is going to extend I will try to find a better solution for optimization. Thank you. – user1002065 Apr 14 '15 at 14:53
0

If config.setBoolean("fs.hdfs.impl.disable. cache", true); is modified first, FileSystem.get(config) can be used in multiple threads.

kai.tian
  • 31
  • 1
  • 5