0

In my previous question I was warking on a simple exercise that watched a directory for file changes. I took the code from this oracle docs, and it worked with no problem, except for the little unchecked cast warning I was unsure about.

The next problem I have with this code is that it's putting a hard loop inside a thread, which is blocking, at least in theory. Now, I know that if the operating system uses time slicing, even the hard loop is split up in small chunks that share the processor's time with the other threads the application is running, and in fact I can make all sorts of examples where hard loops running in different threads don't block each other (as long as they have the same priority), even on a virtual machine expressly created with only one core.

However, the Java language doesn't guarantee which kind of scheduling it uses for thread management, if it's time-slicing or round-robin; it depends on the actual VM implementation and operating system. So the advice I'm getting while studying the subject is to write code as if it had to run on a round-robin thread scheduling, thus avoiding putting hard loops in threads, unless my code can continuously yeld back the control to the other threads with sleep(), wait(), yeld(), etc. (I can think of a GUI where the main thread is the one with a hard loop watching for events, and sending control back to listeners to handle them).

In my case, however, I couldn't think of a way to put the thread to sleep after it handled the file change, or to yeld the control back to the main loop, since the core idea is basically to continuously ask if there are changes to the filesystem. So what I came up to, is a scheduled executor that regularly calls the watching thread. Clearly, this is a tradeoff between having a "blocking" thread, and being immediately notified when a filesystem change happens. Since in the real case I'm going to put this exercise into, I probably won't need immedite notification, I'm happy with that. The code is very straightforward:

// imports...
public class Main
{
    public static FileSystem fs;
    public static Path dir;
    public static WatchService watcher;
    public static WatchKey key;

    public static void main(String[] args)
    {
        fs = FileSystem.getDefault();
        dir = fs.getPath(".");

        try {
            watcher = fs.newWatchService();
            dir.register(watcher, StandardWatchEventKinds.ENTRY_MODIFY);
        } catch (IOException e) {
            System.err.println(e.getMessage());
            return;
        }

        Executors.newScheduledThreadPool(1).scheduleAtFixedRate(new Runnable()
            {
                public void run()
                {
                    Main.key = Main.watcher.poll();
                    if (null == Main.key)
                        return;

                    for (WatchEvent<?> event : Main.key.pollEvents()) {
                        WatchEvent.Kind<?> kind = event.kind();
                        if (kind == StandardWatchEventKinds.OVERFLOW)
                            continue;

                        @SuppressWarnings("unchecked");
                        WatchEvent<Path> ev = (WatchEvent<Path>)event;
                        Path file = ev.context();
                        System.out.println(file);

                        boolean valid = Main.key.reset();
                        if (!valid)
                            System.err.println("Invalid key!");
                    }
                }
            }, 0, 1, TimeUnit.SECONDS);
    }
}

So my questions are:

  1. Am I pushing this too far? I mean, is it actually a good practice to care this much about blocking code in threads, or are the real cases where time slicing is not present, so rare, that I can safely put a hard loop inside a thread, and maybe do this kind of stuff only when I know that my code is going to run maybe on an embedded device with guaranteed round-robin?

  2. Is there any other way to avoid the hard loop in this particular case? Maybe some clever use of thread control methods (sleep(), wait(), etc.), that I can't think of?

Thank you very much, and sorry for the long post.

Community
  • 1
  • 1
swahnee
  • 2,661
  • 2
  • 24
  • 34
  • 2
    Why do you use `poll()`? `take()` would avoid wasting CPU. – Roger Gustavsson May 17 '15 at 07:22
  • @RogerGustavsson If I understood the docs correctly, `take()` (which I've been using in the previous version of this code) blocks the execution waiting for the next watch key, while `poll()` pops the next watch key, or `null` if none is present, and in either case returns immediately. So it seems that `poll()` is the way to go to avoid waiting. – swahnee May 17 '15 at 09:21
  • @swahnee Why do you want to avoid waiting? I thought this is exactly what you want: Let your thread wait until a new watch key arrives. – isnot2bad May 17 '15 at 10:11
  • You are still waiting _somewhere_ - you are simply hiding that waiting in the Executor –  May 17 '15 at 10:11
  • Ok, sorry, my comment was badly worded. My issue is not with waiting, but instead with having a hard loop inside my thread. Let's say I run the code with `take()` and a hard loop in a round-robin implementation (no time-slicing): wouldn't the execution be stuck inside the hard loop if I never manually yeld the control back to the other threads? Would the scheduled executor automatically take care of this, avoiding blocking my threads? I hope I'm being clear. – swahnee May 17 '15 at 11:07
  • What is a 'hard loop'? – isnot2bad May 17 '15 at 11:13
  • When you call `take()`, if nothing is available to take, the thread that does the call is put to sleep, which means it's not using any CPU and all other threads can run unhindered. – Roger Gustavsson May 17 '15 at 11:18
  • @swahnee: To answer your question about looping and not yielding the control; Thread switching is handled by the scheduler, you don't have to call any "yield" method to give control of the CPU to other threads. Actually, calling `take()`, which usually will wait, is the closest thing to yielding that you can do. Your thread can be paused and thrown out of the CPU at any time, and I mean at any time. You can't control when this happens. – Roger Gustavsson May 17 '15 at 11:45
  • @RogerGustavsson Thank you very much! I didn't realize that `take()` implies `wait()`. In the documentation they say "waiting if none are yet present", and I didn't get that it meant an actual call to the `wait()` method. – swahnee May 17 '15 at 12:16

2 Answers2

5

Here is an example how you can watch a directory in a background thread. It is the modified Java Tutorials Code Sample – WatchDir.java referenced by Oracle's The Java Tutorials: Watching a Directory for Changes.

The important part is watcher.take(). Here the calling thread blocks until a key is signalled. So these are the benefits of this approach in opposite to your code snippet:

  1. The thread is 'parked' while waiting in watcher.take(). No CPU cycles/resources are wasted while waiting. The CPU can do other things in the meantime.
  2. watcher.take() returns immediately when the file system is modified. (Your code reacts after 1 second in the worst, and after 0.5 seconds in the average case.)

In the main method, the DirWatcher is instantiated and run by a single threaded ExecutorService. This example waits for 10 seconds before shutting down the watcher and the executor service.

public class DirWatcher implements Runnable {

    private final Path dir;
    private final WatchService watcher;
    private final WatchKey key;

    @SuppressWarnings("unchecked")
    static <T> WatchEvent<T> cast(WatchEvent<?> event) {
        return (WatchEvent<T>) event;
    }

    /**
     * Creates a WatchService and registers the given directory
     */
    public DirWatcher(Path dir) throws IOException {
        this.dir = dir;
        this.watcher = FileSystems.getDefault().newWatchService();
        this.key = dir.register(watcher, ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY);
    }

    public void run() {
        try {
            for (;;) {
                // wait for key to be signalled
                WatchKey key = watcher.take();

                if (this.key != key) {
                    System.err.println("WatchKey not recognized!");
                    continue;
                }

                for (WatchEvent<?> event : key.pollEvents()) {
                    WatchEvent<Path> ev = cast(event);
                    System.out.format("%s: %s\n", ev.kind(), dir.resolve(ev.context()));
                    // TODO: handle event. E.g. call listeners
                }

                // reset key
                if (!key.reset()) {
                    break;
                }
            }
        } catch (InterruptedException x) {
            return;
        }
    }

    public static void main(String[] args) throws IOException, InterruptedException, ExecutionException,
            TimeoutException {

        Path dir = Paths.get("C:\\temp");
        DirWatcher watcher = new DirWatcher(dir);

        ExecutorService executor = Executors.newSingleThreadExecutor();
        Future<?> future = executor.submit(watcher);
        executor.shutdown();

        // Now, the watcher runs in parallel
        // Do other stuff here

        // Shutdown after 10 seconds
        executor.awaitTermination(10, TimeUnit.SECONDS);
        // abort watcher
        future.cancel(true);

        executor.awaitTermination(1, TimeUnit.SECONDS);
        executor.shutdownNow();
    }
}
isnot2bad
  • 24,105
  • 2
  • 29
  • 50
  • Thank you for your suggestion. My misunderstanding was related to the `take()` method, but now I see that it makes the thread `wait()`. – swahnee May 17 '15 at 12:25
  • It's not said that the thread really calls `wait()`. But in fact, it is doing something very _similar_ to `wait()`. – isnot2bad May 17 '15 at 13:25
  • Thanks for the suggestion, it helped. However I'd like to show that it is possible to stop the watcher _gently_ by cancelling the key and closing the watcher.: `private void stopWatchService() { if (key != null) { key.cancel(); } if (watcher != null) { watcher.close(); } }` – socona Jan 05 '18 at 14:58
1

As pointed out in the comments, the take() method doesn't block the thread execution until a new key is provided, but it uses a mechanism similar to the wait() method to put the thread to sleep.

I also found this post where it's pointed out that the WatcherService exploit the native file event notification mechanism, if available (with a notable exception being the OSX implementation).

So, to answer my own question, using take() inside a hard loop does not block the thread, because it automatically put the thread to sleep, letting the other threads use the CPU, until a file change notification comes from the operating system.

Community
  • 1
  • 1
swahnee
  • 2,661
  • 2
  • 24
  • 34