As explained in the blog post you shared, it is quite simply as given below
The Julia package ecosystem already contains quite a few GPU-related
packages, targeting different levels of abstraction as Figure 1 shows.
At the highest abstraction level, domain-specific packages like
MXNet.jl and TensorFlow.jl can transparently use the GPUs in your
system. More generic development is possible with ArrayFire.jl, and if
you need a specialized CUDA implementation of a linear algebra or deep
neural network algorithm you can use vendor-specific packages like
cuBLAS.jl or cuDNN.jl. All these packages are essentially wrappers
around native libraries, making use of Julia’s foreign function
interfaces (FFI) to call into the library’s API with minimal overhead.
CUDAdrv and CUDAnative packages are meant for directly using CUDA runtime API and writing kernels from Julia itself. I believe that is where CuArray come in handy - wrapping native Julia objects into CUDA accessible format, roughly speaking.
ArrayFire on the other hand is a generic library that wraps around all(cuBLAS, cuSparse, cuSolve, cuFFT) CUDA provided domain specific libraries into nice interface(functions). Apart from the interface to CUDA's domain specific libraries, ArrayFire by itself provides lot of other functions in the areas of statistics, image processing, computer vision etc. It has nice JIT feature where user's code is compiled to a runtime kernel - simply put. ArrayFire.jl is an language binding with some extra Julia specific improvements at wrapper level.
That's the general difference. From a developers perspective, using a library(like ArrayFire) basically takes out the burden of keeping up with CUDA API and maintaining/tweaking the kernels for optimum performance which I think takes lot of time.
PS. I am a member of ArrayFire development team.