On a different tack, here’s a deterministic scheme whose additive error is at most ε. As in my other answer, we sort n1 ≥ … ≥ nk, define an approximation f(i) of the mean log sum exp over nonempty subsets where the minimum index is i, and evaluate ∑i (2k−i / (2k − 1)) f(i). If each f(i) is within ε, then so is the result.
Fix i and for 1 ≤ j ≤ k−i define dj = ni+j − ni. Note that dj ≤ 0. We define f(i) = ni + g(i) where g(i) will approximate the mean log one plus sum exp of subsets of {dj | j}.
I don’t want to write the next part formally since I believe that it will be harder to understand, so the idea is that with log sum exp being commutative and associative, we’re going to speed up the brute force algorithm that initializes the list of results as [0] and then, for each j, doubles the size of the list the list by appending the log sum exp with dj of each of its current elements. I claim that, if we round each intermediate result down to the nearest multiple of δ = ε/k, then the result will under-approximate at most ε. By switching from the list to its histogram, we can do each step in time proportional to the number of distinct entries. Finally, we set g(i) to the histogram average.
To analyze the running time, in the worst case, we have dj = 0 for all j, making the largest possible result log k. This means that the list can have at most (log k)/δ = ε−1 k log k + 1 entries, making the total running time O(ε−1 k3 log k). (This can undoubtedly be improved slightly with a faster convolution algorithm and/or by rounding the dj and taking advantage.)