3

The Visual Studio 2017 Quick Info tooltip for Frame.denseCols says "it skips columns that contain missing value in any row." The following example seems to suggest otherwise:

let dateRange (first:System.DateTime) count frac =
    seq {for i in 0..(count - 1) -> first.AddDays(float i + frac)}

let fifth = Series(dateRange (DateTime(2013,1,1)) 10 0.0, rand 10)
let sixth = Series(dateRange (DateTime(2013,1,1)) 5 0.0, [10.0; 20.0; 30.0; 40.0; 50.0])
let dfR10 = Frame(["fifth"; "sixth"], [fifth; sixth])

let sR1 =
    dfR10
    |> Frame.denseCols
sR1.Keys
// val it : seq<string> = seq ["fifth"; "sixth"]

The "sixth" column is empty:

sR1.["sixth"]
(* Deedle.MissingValueException: Value at the key sixth is missing
   at Deedle.Series`2.Get(K key) in C:\code\deedle\src\Deedle\Series.fs:line 311
   at <StartupCode$FSI_0167>.$FSI_0167.main@()
Stopped due to error *)

So the key for a column containing missing values exists but the corresponding series is empty.

On the other hand Frame.denseRows seems to be working fine:

let sR2 =
    dfR10
    |> Frame.denseRows
sR2.Keys
// keys from 1/1/2013 to 1/5/2013

So the key for a row containing missing values does not show up.

Is there an asymmetry between these two commands and the Quick Info for Frame.denseCols is incorrect or am I missing something?

Soldalma
  • 4,636
  • 3
  • 25
  • 38

1 Answers1

2

According to the Deedle source code:

/// We use the terms _sparse_ and _dense_ to denote series that contain some missing values
/// or do not contain any missing values, respectively. The functions `denseCols` and 
/// `denseRows` return a series that contains only dense columns or rows and all sparse
/// rows or columns are replaced with a missing value. The `dropSparseCols` and `dropSparseRows`
/// functions drop these missing values and return a frame with no missing values.

Digging further, denseCols simple calls frame.ColumnsDense:

member frame.ColumnsDense = 
    let newData = data.Select(fun _ vect -> 
      // Assuming that the data has all values - which should be an invariant...
      let all = rowIndex.Mappings |> Seq.forall (fun (KeyValue(key, addr)) -> vect.Value.GetObject(addr).HasValue)
      if all then OptionalValue(ObjectSeries(rowIndex, boxVector vect.Value, vectorBuilder, indexBuilder))
      else OptionalValue.Missing )
ColumnSeries(Series(columnIndex, newData, vectorBuilder, indexBuilder))

which to me looks like it behaves as described - it returns OptionalValue.Missing if not all values are present.

Frank Schmitt
  • 30,195
  • 12
  • 73
  • 107
  • 1
    The text in the Deedle source code is symmetric with respect to rows and columns. The behaviors of denseRows and denseCols are not symmetric. In the example denseRows dropped the rows (and the corresponding row keys) that contained missing values in some column. On the other hand, denseCols did not drop the column key corresponding to a column containing missing values in some row. – Soldalma Mar 21 '17 at 19:19