I don't think you can work around using comparator trees if you want to find the two smallest elements combinationally. However, if your goal isn't low latency than a (possibly pipelined) sequential circuit could also be an option.
One approach that I can come up with on the spot would be to break down the operation doing kind of an incomplete bubble sort in hardware using small sorting networks. Depending on the amount of area you are willing to spend you can use a smaller or larger p-sorting network that combinationaly sorts p elements at a time where p >= 3. You can then apply this network on your input set of size N, sorting p elements at a time. The two smallest elements in each iteration are stored in some sort of memory (e.g. an SRAM memory, if you want to process larger amounts of elements).
Here is an example for p=3 (the brackets indicate the grouping of elements the p-sorter is applied to):
(4 0 9) (8 6 7) (4 2 1) --> (0 4 9) (6 7 8) (1 2 4) --> 0 4 6 7 1 2
Now you start the next round:
You apply the p-sorter on the results of the first round.
Again you store the two smallest outputs of your p-sorter into the same memory overwriting values from the previous round.
Here the continuation of the example:
(0 4 6) (7 1 2) --> (0 4 6) (1 2 7) --> 0 4 1 2
In each round you can reduce the number of elements to look at by a factor of 2/p. E.g. with p==4 you discard half the elements in each round until the smallest two elements are stored at the first two memory locations. So the algorithm has time/cycle complexity of O(n log(n)). For an actual hardware implementation, you probably want to stick to powers of two for the size p of the sorting network.
Although the control logic of such a circuit is not trivial to implement the area should be mainly dominated by the size of your sorting network and the memory you need to hold the first 2/p*N intermediate results (assuming your input signals are not already stored in a memory that you can reuse for that purpose). If you want to tune your circuit towards throughput you can increase p and pipeline the sorting network at the expense of additional area. Additional speedup could be gained by replacing the single memory using up to p two-port memories (1 read and 1 write port each) which would allow you to fetch and write back the data for the sorting network in a single cycle thus increasing the utilization ratio of the comparators in the sorting network.