I too have been thinking about testing in functional code. I don't have all the answers, but I'll write a little bit here.
Functional programs are put together differently, and that demands different testing approaches.
If you take even the most cursory look at Haskell testing, you will inevitably come across QuickCheck and SmallCheck, two very well-known Haskell testing libraries. These both do "property-based testing".
In an OO language you would laboriously write individual tests to set up half a dozen mock objects, call a method or two, and verify that the expected external methods were called with the right data and / or the method ultimately returned the right answer. That's quite a bit of work. You probably only do this with one or two test cases.
QuickCheck is something else. You might write a property that says something like "if I sort this list, the output should have the same number of elements as the input". This is a one-liner. The QuickCheck library will then automatically build hundreds of randomly-generated lists, and check that the specified condition holds for every single one of them. And if it doesn't, it'll spit out the exact input on which the test failed.
(Both QuickCheck and SmallCheck do roughly the same thing. QuickCheck generates random tests, whereas SmallCheck systematically tries all combinations up to a specified size limit.)
You say you're worried about the combinatorial explosion of possible flow control paths to test, but with tools like this generating the test cases for you dynamically, manually writing enough tests isn't a problem. The only problem is coming up with enough data to test all flow paths.
Haskell can help with that too. I read a paper about a library [I don't know if it ever got released] which actually uses Haskell's lazy evaluation to detect what the code under test is doing with the input data. As in, it can detect whether the function you're testing looks at the contents of a list, or only the size of that list. It can detect which fields of that Customer record are being touched. And so forth. In this way, it automatically generates data, but doesn't waste hours generating different random variations of parts of the data that aren't even relevant for this particular code. (E.g., if you're sorting Customers by ID, it doesn't matter what's in the Name field.)
As for testing functions that take or produce functions... yeah, I don't have an answer to that.