The first data structures that come to my mind are either Maps from Data.Map
or Sequences from Data.Sequence
.
Update
Sequences are persistent data structures that allow most operations efficient, while allowing only finite sequences. Their implementation is based on finger-trees, if you are interested. But which qualities does it have?
- O(1) calculation of the length
- O(1) insert at front/back with the operators
<|
and |>
respectively.
- O(n) creation from a list with
fromlist
- O(log(min(n1,n2))) concatenation for sequences of length n1 and n2.
- O(log(min(i,n-i))) indexing for an element at position
i
in a sequence of length n.
Furthermore this structure supports a lot of the known and handy functions you'd expect from a list-like structure: replicate
, zip
, null
, scan
s, sort
, take
, drop
, splitAt
and many more. Due to these similarities you have to do either qualified import or hide the functions in Prelude
, that have the same name.
Data.Map
Maps
are the standard workhorse for realizing a correspondence between "things", what you might call a Hashmap or associave array in other programming languages are called Maps in Haskell; other than in say Python Maps
are pure - so an update gives you back a new Map and does not modify the original instance.
Maps come in two flavors - strict and lazy.
Quoting from the Documentation
Strict
API of this module is strict in both the keys and the values.
Lazy
API of this module is strict in the keys, but lazy in the values.
So you need to choose what fits best for your application. You can try both versions and benchmark with criterion
.
Instead of listing the features of Data.Map
I want to pass on to
Which can leverage the fact that the keys are integers to squeeze out a better performance
Quoting from the documentation we first note:
Many operations have a worst-case complexity of O(min(n,W)). This means that the operation can become linear in the number of elements with a maximum of W -- the number of bits in an Int (32 or 64).
So what are the characteristics for IntMaps
- O(min(n,W)) for (unsafe) indexing
(!)
, unsafe in the sense that you will get an error if the key/index does not exist. This is the same behavior as Data.Sequence
.
- O(n) calculation of
size
- O(min(n,W)) for safe indexing
lookup
, which returns a Nothing
if the key is not found and Just a
otherwise.
- O(min(n,W)) for
insert
, delete
, adjust
and update
So you see that this structure is less efficient than Sequences
, but provide a bit more safety and a big benefit if you actually don't need all entries, such the representation of a sparse graph, where the nodes are integers.
For completeness I'd like to mention a package called persistent-vector
, which implements clojure-style vectors, but seems to be abandoned as the last upload is from (2012).
Conclusion
So for your use case I'd strongly recommend Data.Sequence
or Data.Vector
, unfortunately I don't have any experience with the latter, so you need to try it for yourself. From the stuff I know it provides a powerful thing called stream fusion, that optimizes to execute multiple functions in one tight "loop" instead of running a loop for each function. A tutorial for Vector
can be found here.