Provably correct permutation in less than O(n^2)

Question

Written in Haskell, here is the data type that proves that one list is a permutation of another:

data Belongs (x :: k) (ys :: [k]) (zs :: [k]) where
  BelongsHere :: Belongs x xs (x ': xs)
  BelongsThere :: Belongs x xs xys -> Belongs x (y ': xs) (y ': xys)

data Permutation (xs :: [k]) (ys :: [k]) where
  PermutationEmpty :: Permutation '[] '[]
  PermutationCons :: Belongs x ys xys -> Permutation xs ys -> Permutation (x ': xs) xys

With a Permutation, we can now permute a record:

data Rec :: (u -> *) -> [u] -> * where
  RNil :: Rec f '[]
  (:&) :: !(f r) -> !(Rec f rs) -> Rec f (r ': rs)

insertRecord :: Belongs x ys zs -> f x -> Rec f ys -> Rec f zs
insertRecord BelongsHere v rs = v :& rs
insertRecord (BelongsThere b) v (r :& rs) = r :& insertRecord b v rs

permute :: Permutation xs ys -> Rec f xs -> Rec f ys
permute PermutationEmpty RNil = RNil
permute (PermutationCons b pnext) (r :& rs) = insertRecord b r (permute pnext rs)

This works fine. However, permute is O(n^2) where n is the length of the record. I'm wondering if there is a way to get it to be any faster by using a different data type to represent a permutation.

For comparison, in a mutable and untyped setting (which I know is a very different setting indeed), we could apply a permutation to a heterogeneous record like this in O(n) time. You represent the record as an array of values and the permutation as an array of new positions (no duplicates are allowed and all digits must be between 0 and n). Applying the permutation is just iterating that array and indexing into the record's array with those positions.

I don't expect that an O(n) permutation is possible in a more rigorously typed settings. But it seems like O(n*log(n)) might be possible. I appreciate any feedback, and let me know if I need to clarify anything. Also, answers to this can use Haskell, Agda, or Idris depending on what it feels easier to communicate with.

I don't work with lifted code like this very often so having trouble reasoning about above, but I wonder whether there are cases where GHC will be able to typecheck this but won't fully evaluate everything at compile time. — jberryman, Mar 05 '17 at 03:40
Maybe we need a O(1) dependent array `Array [k]` to achieve this. Anyway, in a fully dependent system, you would be able to keep around the permutation (as a position array) and a proof that it relates the two lists. To apply the permutation you can then use the simple O(n) algorithm you mention. Perhaps one can do the same with GADTs, type-level nats, and singletons. — chi, Mar 05 '17 at 08:32
@jberryman Maybe if the lists are small enough and the permutation is known at compile time, inlining might erase the whole computation. But, in this situations I'm thinking of, we don't know the permutation until runtime, so all of the work is definitely going to be done then. — Andrew Thaddeus Martin, Mar 05 '17 at 13:02
@chi I agree that some kind of dependent array would be needed to recover `O(n)` performance. It's actually a little more problematic. You need a mutable dependent array (for the list that's getting filled in) that somehow ends up with a proof that something was assigned to every index. However, dependent types and arrays just don't seem to go well together. The standard `Vec` type in every DT language is inductively defined and has `O(n)` lookups. The structure is needed for the type system to be able to do anything with it. — Andrew Thaddeus Martin, Mar 05 '17 at 13:07
Your `Belongs x ys zs` datatype says "`zs` is `ys` with `x` inserted somewhere", and its (`Nat`-like) representation gives you _`x`'s position in `zs`_. So `Permutation` is a list of indexes; applying a permutation amounts to _sorting that list of indexes_. IOW it's your choice of sorting algo that's to blame, not your data structures. You're using insertion sort; switching to (e.g.) merge sort would give you O(n log n). Of course the challenge now is to write a typed merge sort! See [_How to Keep Your Neighbours in Order_](https://personal.cis.strath.ac.uk/conor.mcbride/Pivotal.pdf) — Benjamin Hodgson, Mar 05 '17 at 13:32
@BenjaminHodgson Thanks for the link to the McBride paper. I'll give that a read. It looks like it might help. I compeletely agree that the problem is insertion sort. However, I would be really impressed if I could somehow switch to merge sort without changing the `Permutation` data structure. Currently, `Permutation` is `n^2` is its size, so any algorithm that touches all of its contents must be at least `O(n^2)`. — Andrew Thaddeus Martin, Mar 05 '17 at 15:30
You are currently effectively 'counting' up to the desired position in the original list in unary. If you switch to a way to encode the position in binary or skew binary then you can encode the same information in O(n log n) and your offsets will take log n space rather than n space to encode. Implementing this so that you can get an O(n log n) implementation will require some form of tree based encoding of the original input as well, lest you spend too long walking to the appropriate element to perform the permutation. — Edward Kmett, Mar 07 '17 at 00:38
I'm not sure I'm interpreting the problem correctly, but if your records are anything like lists, and if you don't need to know the details of the permutation, then you could O(n log n) sort both records, then O(n) compare them for equality, allowing an overall O(n log n) permutation check. — Zoey Hewll, Mar 07 '17 at 02:31
@EdwardKMETT Ah, changing the original list to a tree-like representation was the crucial insight that I had overlooked entirely. I think that I can start working toward a solution, although if anyone has any links to code where a similar data algorithm has been implemented in some dependently typed ML, that would be appreciated. — Andrew Thaddeus Martin, Mar 07 '17 at 02:36
It might even be possible to use @EdwardKMETT's solution without losing the original record implementation. I suspect that `TypeInType` should allow you state useful claims about `toTree` and `fromTree`. None of this is going to be easy though. — dfeuer, Mar 07 '17 at 18:43
@dfeuer Yeah, I was hoping that would be possible since I do still need to ultimately end working with a record that's parameterized by the list. It seems like it should be possible to write some data type ListTreeRefl that let's me go back and forth between the two (in subquadratic time), but as you say, it's not going to be easy. — Andrew Thaddeus Martin, Mar 07 '17 at 21:04

bebbo · Answer 1 · 2017-05-01T13:19:05.760

A faster simple solution is to compare the sorted permutation of the permutations.

Given permutation A and B.
Then there exist the sorted permutations,

As = sort(A) Bs = sort(B)
As is a permutation of A and Bs is a permutation of B.
If As == Bs then A is a permutation of B.

Thus the order of this algorithm is O(n log(n)) < O(n²)

And this is leading to the optimal solution.

Using a different storage of permutation yields O(n)

Using the statements from above, we are changing the storage format of each permutation into

the sorted data
the original unsorted data

To determine if a list is a permutation of another one, simple a comparison of the sorted data is necessary -> O(n).

This answers the question correctly, but the effort is hidden in creating the doubled data storage ^^ So it will depend on the use if this is a real advantage or not.

Provably correct permutation in less than O(n^2)

1 Answers1