0

Python's pandas library allows getting info() on a data frame.

For example.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Name           30 non-null     object 
 1   PhoneNumber    30 non-null     object 
 2   City           30 non-null     object 
 3   Address        30 non-null     object 
 4   PostalCode     30 non-null     object 
 5   BirthDate      30 non-null     object 
 6   Income         26 non-null     float64
 7   CreditLimit    30 non-null     object 
 8   MaritalStatus  24 non-null     object 
dtypes: float64(1), object(8)
memory usage: 2.2+ KB

Is there an equivalent in Deedle's data frame? Something that can get an overview for missing values and the inferred types.

Alkasai
  • 3,757
  • 1
  • 19
  • 25

2 Answers2

2

There isn't a single function to do this - it would be a nice addition to the library if you wanted to consider sending a pull-request.

The following gets all the information you would need:

// Prints column names and types, with data preview
df.Print(true)

// Print key range of rows (or key sequence if it is not ordered)
if df.RowIndex.IsOrdered then printfn "%A" df.RowIndex.KeyRange
else printfn "%A" df.RowIndex.Keys

// Get access to the data of the frame so that we can inspect the columns
let dt = df.GetFrameData()
for n, (ty, vec) in Seq.zip dt.ColumnKeys dt.Columns do 
  // Print name, type of column
  printf "%A %A" n ty
  // Query the interal data storage to see if it uses
  // array of optional values (may have nulls) or not
  match vec.Data with 
  | Vectors.VectorData.DenseList _ -> printfn " (no nulls)"
  | _ -> printfn " (nulls)" 
Tomas Petricek
  • 240,744
  • 19
  • 378
  • 553
  • This gives me thoughts - on Pandas, R, F# and Microsoft.Data.Analysis. I don't know if you have a better answer for [this question](https://stackoverflow.com/questions/74219024/c-sharp-microsoft-data-analysis-dataframe-to-sql-server) – Panagiotis Kanavos Oct 31 '22 at 09:09
0

Based on Thomas's suggestion (thank you!) I modified it slightly to produce an output similar to pandas:

let info (df: Deedle.Frame<'a,'b>) =
let dt = df.GetFrameData()
let countOptionalValues d =
    d
    |> Seq.filter (
        function
        | OptionalValue.Present _ -> true
        | _ -> false
    )
    |> Seq.length

Seq.zip dt.ColumnKeys dt.Columns
|> Seq.map (fun (col, (ty, vec)) ->
    {|
        Column = col
        ``Non-Null Count`` =
            match vec.Data with
            | Vectors.VectorData.DenseList d -> $"%i{d |> Seq.length} non-null"
            | Vectors.VectorData.SparseList d -> $"%i{d |> countOptionalValues} non-null"
            | Vectors.VectorData.Sequence d -> $"%i{d |> countOptionalValues} non-null"
        Dtype = ty
    |}
)

Pandas output: enter image description here

Deedle output: enter image description here

Alkasai
  • 3,757
  • 1
  • 19
  • 25