I will assume that you will pass removeDuplicates
to frequency
and work with this slightly modified version of your code:
frequency :: (Eq a) => [a] -> [(a, Int)]
frequency li = removeDuplicates $ map (\el -> (el, indexes el)) li
where
indexes el = length $ findIndices (== el) li
removeDuplicates :: (Eq a) => [(a, Int)] -> [(a, Int)]
removeDuplicates [] = []
removeDuplicates ((x1, x2) : xs) =
(x1, x2) : removeDuplicates (filter (\(y1, y2) -> x1 /= y1) xs)
Let's look at what each part of frequency
is doing:
map (\el -> (el, indexes el)) li
As you allude to, map f li
is, in principle, O(n) in the length of the list li
. That, however, only holds if the complexity of f
does not depend on li
. For that reason, we need to double-check the function being mapped:
\el -> (el, indexes el)
Substituting the definition of indexes
, we get:
\el -> (el, length $ findIndices (== el) li)
findIndices
is O(n) in the length of the list, as it needs to test each element, and so the complexity of this function is at least O(n) in the length of li
. length
is also linear in the length of the list, which means that in the worst case (that is, when all elements are equal to el
) it will also be O(n) in the length of li
. Given that findIndices
is already O(n), length doesn't affect the overall complexity. Finally, the creation of the pair, which is the final step, is constant time and unproblematic.
We can thus conclude \el -> (el, indexes el)
is O(n) in the length of li
. That being so, map (\el -> (el, indexes el)) li
is actually O(n^2) in the length of li
, as it performs an O(n) operation n times.
removeDuplicates
Let's focus on the recursive case:
(x1, x2) : removeDuplicates (filter (\(y1, y2) -> x1 /= y1) xs)
The key operation here is the filtering, which is O(n) in the length of xs
. The filtering is done once per element of li
. Now, even though xs
gets shorter as we move towards the end of the list, the average length of xs
is proportional to the length of li
. That being so, we are once more performing an O(n) operation (in the length of li
) n times, which means removeDuplicates
is O(n^2) -- just like nub
from Data.List
. (Another way of reaching the same conclusion would be noticing that removeDuplicates
compares each element with every other element, resulting in n*(n-1)/2 comparisons.)
frequency li = removeDuplicates $ map (\el -> (el, indexes el)) li
frequency
consists of an O(n^2) operation followed by another O(n^2) operation; therefore, it is O(n^2) in the length of the list.
Is there a O(1) method of performing the same function?
O(1) is impossible, as there is no getting around the need to do something to each element of the list. It is certainly possible to do better than O(n^2), though. For instance, by sorting the list you would avoid the need of comparing each element with all others (as happens both in map (\el -> (el, indexes el)) li
and removeDuplicates
), as in a sorted list only elements next to each other might possibly be equal. For a concrete example, this function...
group . sort
... is O(n*log(n)) (sort
from Data.List
is O(n*log(n)), and group
is O(n), as it only needs to compare each element to the next one).
P.S.: This is probably beside the point for what you are trying to do, but for something entirely different, you might want to experiment with using a dictionary to keep track of the tallies. That would make an effectively linear frequency
possible, which should pay off performance-wise if you need to handle large input lists.