If the length of s = s_L, a very crude way of doing this could be implemented in thrust:
http://thrust.github.com.
First, create a vector val of length s_L x n that repeats s n times.
Create a vector val_keys associate n unique keys repeated s_L times with each element of val, e.g.,
val = {1,2,...,7,1,2,...,7,....,1,2,...7}
val_keys = {0,0,0,0,0,0,0,1,1,1,1,1,1,2,2,2,...., n,n,n}
Now the fun part. create a vector of length s_L x n uniformly distributed random variables
U = {0.24, 0.1, .... , 0.83}
then you can do zip iterator over val,val_keys and sort them according to U:
http://codeyarns.com/2011/04/04/thrust-zip_iterator/
both val, val_keys will be all over the place, so you have to put them back together again using thrust::stable_sort_by_key() to make sure that if val[i] and val[j] both belong to key[k] and val[i] precedes val[j] following the random sort, then in the final version val[i] should still precede val[j]. If all goes according to plan, val_keys should look just as before, but val should reflect the shuffling.