3

I would like to create a data dictionary for all the tables and columns that I have imported in my Dataiku project.

For example, in SAS there are SASHELP.VCOLUMN and .VTABLE that covers such functionality.

Is there a smart way to do it in Dataiku?

agrm
  • 3,735
  • 4
  • 26
  • 36
kacperdominik
  • 194
  • 2
  • 10

1 Answers1

4

Should you be inside or outside Dataiku I think you should use the python API (accessible from a Dataiku notebook or using the python client library

import dataiku
import json

# Listing project datasets 
myproject = client.get_project('YOUR_PROJECT_NAME')
datasets = project.list_datasets()


for datasetName in project.list_datasets():
    # get dataset object 
    dataset = myproject.get_dataset(' batting_postseason')
    # dump dataset schema
    json.dumps(dataset.get_schema())

Since you can also install this client api outside DSS it's the most universal way to me but beware Dataiku also provide a catalog and public api call to index Dataiku connections and retrieve statistics on all your items including those not used in a project yet.

Edit :

There's also a plugin called "Audits a dataset" that allows you to generate quickly such a report without coding.

leldo
  • 386
  • 2
  • 9