Ok, here are some thoughts on the problem.
Fake random function is usually called Pseudo Random Numbers Generators (PRNG).
You might be interested in doubles in [0...1) range, but PRNG usually generates single 64bit (good for double) or 32bit (good for float) integer number. Conversion to double, while not quite trivial, is rather simple operation.
Typical PRNG has state, initiated with seed, and output. For simplest PRNGs (like LCG) seed, state and output are the same thing, but it is not true in general. Usually state is characterized by number of bits (say, 64bits for LCG up to 19937bits for Mersenne twister).
Making pure functions from any PRNG algorithm is rather simple - PRNG is just a set of three functions in the form of
state_type make_state(seed_type seed) {
// convert seeding to state, bits chopping
return new_state;
}
state_type advance_state(state_type old_state) {
// do bits chopping with old state
// and advance to the next state
return new_state;
}
uint64_t generate_output(state_type state) {
// extract 64bits of randomness from state
return new_random_number;
}
And that is it, there is nothing more in the PRNG beyond those functions.
And, to the question at hands
You could use non-crypto hash with good Avalanche properties, basically meaning single bit change in input value (input increased by 1) cause big changein output. Fast, reasonable, might be not very random. Murmur is ok, as well as Mum hash.
Crypto cipher running in the counter mode. Slower than option 1, but high quality numbers. Relative large state (say 512bits or so). I prefer ChaCha20 - it is well-known, reasonable fast, take a look at code here. Both option 1 and 2 assume you have just linearly increasing counter as input.
Another option is using PRNG which has logarithmic complexity jump ahead function. This you could start with global seed, and if you have 210 CUDA cores, your first core will use seed, second will jump ahead by 264/210=254, which with O(log2(N)) complexity is only 54 operations, third will jump ahead of second by another 254 steps and 54 operations and so on and so forth. Out of known PRNGs logarihmic jump ahead works for LCG as well as PCG. I would recommend to look at PCG.
It means there is non-trivial function in the form of
state_type advance_state(state_type old_state, int64_t distance) {
// non-trivial advance by distance
// non-trivial means it is not just a loop, it is better than linear algorithm
return new_state;
}