4

I know that I can use integer keys for a hashmap like the following example for a Dictionary. But Dictionaries are unordered and do not benefit from having integer keys.

julia> hashmap = Dict( 5 => "five", 9 => "nine", 16 => "sixteen", 70 => "seventy")
Dict{Int64,String} with 4 entries:
  9  => "nine"
  16 => "sixteen"
  70 => "seventy"
  5  => "five"

julia> hashmap[9]
"nine"

julia> hashmap[8:50] # I would like to be able to do this to get keys between 8 and 50 (9 and 16 here)
ERROR: KeyError: key 8:50 not found
Stacktrace:
 [1] getindex(::Dict{Int64,String}, ::UnitRange{Int64}) at ./dict.jl:477
 [2] top-level scope at REPL[3]:1

I'm looking for an ordered structure allowing access all it's keys within a certain range while benefiting from performance optimization due to sorted keys.

Hugo Trentesaux
  • 1,584
  • 1
  • 16
  • 30

4 Answers4

5

There is a dedicated library named DataStructures which has a SortedDict structure and corresponding search functions:

using DataStructures
d = SortedDict(5 => "five", 9 => "nine", 16 => "sixteen", 70 => "seventy")

st1 = searchsortedfirst(d, 8)   # index of the first key greater than or equal to 8
st2 = searchsortedlast(d, 50)  # index of the last key less than or equal to 50

And now:

julia> [(k for (k,v) in inclusive(d,st1,st2))...]
3-element Array{Int64,1}:
  9
 16
Hugo Trentesaux
  • 1,584
  • 1
  • 16
  • 30
Przemyslaw Szufel
  • 40,002
  • 3
  • 32
  • 62
1

I do not think there is such a structure in the standard library, but this could be implemented as a function on an ordinary dictionary as long as the keys are of a type that fits the choice of range:

julia> d = Dict(1 => "a", 2 => "b", 5 => "c", 7 => "r", 9 => "t")
Dict{Int64,String} with 5 entries:
  7 => "r"
  9 => "t"
  2 => "b"
  5 => "c"
  1 => "a"

julia> dictrange(d::Dict, r::UnitRange) = [d[k] for k in sort!(collect(keys(d))) if k in r]
dictrange (generic function with 1 method)

julia> dictrange(d, 2:6)
2-element Array{String,1}:
 "b"
 "c"
Bill
  • 5,600
  • 15
  • 27
1

get allows you to have a default value when none is defined, you can default to missing and then skip them

julia> hashmap = Dict( 5 => "five", 9 => "nine", 16 => "sixteen", 70 => "seventy")
Dict{Int64,String} with 4 entries:
  9  => "nine"
  16 => "sixteen"
  70 => "seventy"
  5  => "five"

julia> get.(Ref(hashmap), 5:10, missing)
6-element Array{Union{Missing, String},1}:
 "five"
 missing
 missing
 missing
 "nine"
 missing

julia> get.(Ref(hashmap), 5:10, missing) |> skipmissing |> collect
2-element Array{String,1}:
 "five"
 "nine"
MarcMush
  • 1,439
  • 6
  • 13
  • 1
    This will be very inefficient when ranges are broad yet elements are sparse (e.g searching for 5 elements in a `1_000_000:20_000_000` key range) – Przemyslaw Szufel Nov 18 '20 at 11:08
0

In the case you are working with dates, you might consider have a look at the TimeSeries package which does what you want provided your integer keys are representing dates:

using TimeSeries

dates = [Date(2020,11,5), Date(2020,11,9), Date(2020,11,16), Date(2020,11,30)]
times = TimeArray(dates, ["five", "nine", "sixteen", "thirty"])

And then:

times[Date(2020,11,8):Day(1):Date(2020,11,20)]
2×1 TimeArray{String,1,Date,Array{String,1}} 2020-11-09 to 2020-11-16
│            │ A         │
├────────────┼───────────┤
│ 2020-11-09 │ "nine"    │
│ 2020-11-16 │ "sixteen" │
Hugo Trentesaux
  • 1,584
  • 1
  • 16
  • 30