The piece of code below filters an IEnumerable<T>
with another, used as a blacklist. The filtered collection iterates over content fetched remotely (lazy loading, YouTube API).
IEnumerable<CustomType> contentThatCanBeHuge = this.FetchContentThatCanBeHuge();
IEnumerable<string> blackListContent = this.FetchBlackListContent();
return contentThatCanBeHuge.Where(x => !blackListContent.Contains(x.Id));
The Enumerable.Contains
method is O(n) in time complexity, so the Enumerable.Where
call could take a while.
In the other hand, HashSet<T>.Contains
is O(1). Instantiating a HashSet<T>
from an IEnumerable<T>
seems to be O(n).
If the blacklist is about to be used multiple times, and without taking space complexity into account, is it a good approach to turn it into a HashSet<T>
before using it or is this just premature optimization?