0

I have a String field that is initialized to null but then accessed by potentially more than one thread. The value will be lazily initialized to an idempotently calculated value upon first access.

Does this field need to be volatile to be thread safe?

Here's an example.

public class Foo {
    private final String source;
    private String BAR = null;

    public Foo(String source) {
        this.source = source;
    }

    private final String getBar() {
        String bar = this.BAR;
        if (bar == null) {
            bar = calculateHashDigest(source); // e.g. an sha256 hash
            this.BAR = bar;
        }
        return bar;
    }

    public static void main(String[] args) {
        Foo foo = new Foo("Hello World!");
        new Thread(() -> System.out.println(foo.getBar())).start();
        new Thread(() -> System.out.println(foo.getBar())).start();
    }
}

I used System.out.println() for the example but I'm not worried about what happens when its calls are interlocked. (Though I'm pretty sure that's thread-safe too.)

Does BAR need to be volatile?

I'm thinking the answer is No, volatile is not required, and Yes it's thread safe, primarily because of this excerpt from JLS 17.5:

final fields also allow programmers to implement thread-safe immutable objects without synchronization. A thread-safe immutable object is seen as immutable by all threads, even if a data race is used to pass references to the immutable object between threads.

And since the char value[] field of String is indeed final.

(int hash isn't final but it's lazy initialization looks sound as well.)

Edit: Edit to clarify the value intended for BAR is a fixed value. Its calculation is idempotent and has no side-effects. I don't mind if the calculation is repeated across threads, or if BAR becomes effectively a thread-local due to memory-caching / visibility. My concern is, if it's non-null then it's value is complete and not somehow partial.

antak
  • 19,481
  • 9
  • 72
  • 80
  • 2
    Uhm, the field `BAR` is not `final`, so the clause on final-s does not apply. (though getBar() will always set the field to the same value; no matter how many threads run it concurrently) – Daniele Apr 11 '19 at 13:52
  • If the fixed value is literally a string like this, you can just `return "Hello World!";` or something similar, you're wasting time checking a field for `null`. Or just assign `BAR` directly in stead of starting with a `null` value. If `BAR` is more complex then there might be an advantage to lazy loading, but in that case *idempotent* or being a fixed value doesn't matter, thread safety must still be considered. – markspace Apr 11 '19 at 14:52
  • 1
    What you have now is thread safe as long as `calculateHashDigest()` is thread safe and only depends on its input (and of course has no side effects), and as you said you don't care if `BAR` is immediately visible (it will be made visible eventually, but one can't say when that is). – markspace Apr 11 '19 at 21:01

2 Answers2

2

Your code is (technically) not thread-safe.

It is true that String is a correctly implemented immutable type, and the things that you say about its final fields are correct. But that's not where the thread-safety issues are.

The first issue is that there is a race condition in the lazy initialization of BAR. If two threads call getBar() simultaneously, they will both see BAR as null and both then try to initialize it.

The second issue is that there is a memory hazard. Since there are no happens-before relationships between one thread's write to BAR and another thread's subsequent read of BAR, there is no guarantee that the second thread will see the initialized value of BAR. Hence, it may repeat the initialization.

Note that in the example as written, these two issues are not a practical thread-safety problem. The initialization you are performing is idempotent. It makes no difference to the behavior of the code that you might initialize BAR multiple times, since you always initialize it to a reference to the same String object. (The cost of a single redundant initialization too small to worry about.)

However, if BAR was a reference to a mutable object, or if the initialization was expensive, then this is a real thread-safety problem.

As @Ravindra says, the simple solution is to declare getBar to be synchronized. This addresses both issues.

Your idea of declaring BAR addresses the memory hazard, but not the race condition.


You added the following to your Question:

Edit to clarify the value intended for BAR is a fixed value. Its calculation is idempotent and has no side-effects. I don't mind if the calculation is repeated across threads, or if BAR becomes effectively a thread-local due to memory-caching / visibility. My concern is, if it's non-null then it's value is complete and not somehow partial.

This changes nothing that I said above. If the value is a String, then it is a properly implemented immutable object, and you will always see a complete value irrespective of anything else. That's what the JLS quote says!

(Actually, I am glossing over the detail that String uses a non-final field to hold a lazily computed hashcode. However, the String::hashCode implementation takes care of that. There is no thread-safety issue with it. Check it for yourself if you like.)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • You've made very good points that should be kept for reference but I've added an edit to clear up that `BAR` is indeed a fixed value, just like in the example. I wanted to make sure this is safe and you sort of mention it is. The cost of recalculation is insignificant because the number of threads for the most part is one. – antak Apr 11 '19 at 14:47
  • 1
    @antak If `BAR` is a constant string like this, then you're OK. If it's not, then "fixed value" doesn't matter. Java has memory visibility requirements, and even a constant might not be visible between threads if you don't use the language properly. – markspace Apr 11 '19 at 14:55
  • It's worth noting that variable assignment **except for long** is guaranteed to be atomic. If you change the type of "BAR" to long the code is not safe, even though the assignment is idempotent. – Thomas Bitonti Apr 11 '19 at 17:28
  • See also: https://stackoverflow.com/questions/11963972/in-java-can-i-depend-on-reference-assignment-being-atomic-to-implement-copy-on-w – Thomas Bitonti Apr 11 '19 at 17:28
  • *You still have a thread-safety issue unless the BAR value is...*: The question asks about Strings specifically. I think the points you're making, although valid for a more general case, are unrelated to the question. Also your use of the term "thread-safety issues" has me worried I might be missing something, because they don't appear to be issues for the design presented in the question. – antak Apr 12 '19 at 05:33
  • @antak - OK. I misunderstood your "clarification". – Stephen C Apr 12 '19 at 06:31
  • @ThomasBitonti except for long *and double*. – Holger Apr 12 '19 at 16:42
1

Your code is not thread safe. It appears you might be thinking of a double-checked locking pattern. The correct pattern goes something like this:

public class Foo {

    private static volatile String BAR = null;

    private static String getBar() {
        String bar = BAR;
        if (bar == null) {
          synchronized( Foo.class )
            if( bar == null ) {
              bar = "Hello World!";
              BAR = bar;
            }
        }
        return bar;
    }
    // ...

So two things here.

  1. If BAR is already initialized, the synchronized block isn't entered. volatile is necessary here because some synchronization is needed and the read of BAR will be synchronized with the write to the volatile BAR.

  2. If BAR is null then we enter the synchronized block, we have to check again that BAR is still null so we can do the check and the assignment atomically. If we don't check atomically then there's a chance that BAR will be initialized more than once.

You quoted the Java spec. about the final keyword. While String is immutable and uses the final keyword internally, that doesn't affect your field BAR. The string is fine, but your field is still a shared memory location and needs to be synchronized if you expect it to be thread safe.

Also another poster mentioned interning strings. They are correct to say that in this particular instance, there will be only one "Hello World!" object because the JVM spec guarantees that strings are interned. That's a weird form of thread safety that doesn't work for other objects, so only use it when you are sure it will work correctly. Most objects that you create yourself won't be able to use the code you have as it is now.

Lastly I thought I'd point out that because "Hello World!" is a string object already, there's not much point in trying to "lazy load" it. Strings are created by the JVM when your class is loaded, so they already exist by the time your method is run, or even by the time BAR is read for the first time. In this case with just a string there's not advantage to trying to "lazy load" the string.

public class Foo {

    //  probably better, simpler
    private static final String BAR = "Hello World!";
markspace
  • 10,621
  • 3
  • 25
  • 39
  • Thank you. I've added an edit to clarify I don't need an at-most-once guarantee on the initialization operation for `BAR`. – antak Apr 11 '19 at 14:57
  • If the object is anything besides a string literal like this, then at-most-once doesn't matter. You object might not be visible at all if you don't follow the thread safety specification of the language. I'm starting to feel that you need to create a better example, `String` in Java has special thread safe properties that are leading to people giving you various answers, and you keep mutating the question in weird ways that make it seem that your real question is very different. @antak – markspace Apr 11 '19 at 15:00
  • Please, I asked specifically about String and that's what I put in the title because that's what I'm concerned with. You have a valid point about the example being bad and I've tried to add clarification to a reply to your comment on the question. I've also updated the example so the string doesn't appear a literal. – antak Apr 11 '19 at 15:27
  • But that doesn't make any sense. Why would you deliberately create new strings like that if a string literal will do? I really think you need to give us more info. If you're returning a string literal, it's probably going to be safe no matter what you do, but almost anything else will need proper synchronization. And the examples you give don't need lazy initialization. The "lazy" parts do nothing useful, they're just dead code. So that parts weird too. – markspace Apr 11 '19 at 15:39
  • It wasn't a literal but I see how my original example fell short and made the question really hard to answer. Good point about string interning +1. That's something I hadn't intended to put in the example and as a result the example didn't even stand up to evaluating the thread-safe immutable object-ness of Strings. *The string is fine,*: This is really what I wanted to know. The (non-)visibility of the assignment to `BAR`'s is fine as long as I don't get any *partial visibility* of the String itself. – antak Apr 11 '19 at 17:25