4

I would like to understand if this is possible to populate a Superset dataset using the Superset API. I browsed the documentation and I have the feeling that we can't do it the way we upload a CSV file for example. I know the other possiblitiy is to directly write in the database but I find APIs more secure and also more easy to use and maintain.

Rubén
  • 34,714
  • 9
  • 70
  • 166
curuba
  • 527
  • 1
  • 10
  • 29
  • 1
    It's possible with the new import/export API. Do you want to update an existing dataset with newer data? Or create a new one from scratch? If updating an existing one, do you want to replace the data or append new data? – Beto Dealmeida Feb 09 '21 at 18:14

1 Answers1

6

Superset 1.0 has a new import/export API that allows you to do that, it's how we're loading some of the new examples dashboards. The first thing you need to do is to enable the feature, by turning on the VERSIONED_EXPORT feature flag in superset_config.py:

FEATURE_FLAGS: Dict[str, Any] = {
    "VERSIONED_EXPORT": True,
}

Let's assume you want to create a new dataset, with the underlying table and data. First, create a ZIP file with these two files:

# your_dataset.zip
./metadata.yaml
./datasets/examples/your_dataset.yaml

The metadata.yaml file should look like this:

version: 1.0.0
type: SqlaTable
timestamp: '2020-12-23T23:16:56.155576+00:00'

(timestamp can be anything)

The file your_dataset.yaml can have any name, and should describe your columns and any additional metrics you want to define in the dataset:

table_name: your_dataset
main_dttm_col: null
description: null
default_endpoint: null
offset: 0
cache_timeout: null
schema: tap_slack
sql: null
params: null
template_params: null
filter_select_enabled: false
fetch_values_predicate: null
extra: null
uuid: 3e8130eb-0831-d568-b531-c3ce6d68d3d8  # can by any UUID
metrics:
- metric_name: count
  verbose_name: COUNT(*)
  metric_type: count
  expression: COUNT(*)
  description: null
  d3format: null
  extra: null
  warning_text: null
columns:
- column_name: user_id
  verbose_name: null
  is_dttm: false
  is_active: true
  type: VARCHAR
  groupby: true
  filterable: true
  expression: null
  description: null
  python_date_format: null
version: 1.0.0
database_uuid: 566ca280-3da8-967e-4aa4-4b349218736a  # MUST be your database UUID
data: https://example.com/your_dataset.csv

Now POST the file to the new endoint:

$ curl -X POST -F 'formData=@/path/to/your_dataset.zip' https://superset.example.com/api/v1/dataset/import/

This should create the dataset under the database of UUID 566ca280-3da8-967e-4aa4-4b349218736a, creating the table and populating with data from https://example.com/your_dataset.csv.

If you want to overwrite the data on an existing table or an existing dataset you can do a similar process, but you should also pass -F 'overwrite=true', in which case the existing table will be dropped and replaced by the data in the CSV URL (so it can have a new schema).

Currently there's no way to append data to an existing table through the import/export API. You could use the CSV file API to append data to an existing table, though, it has that option.

Beto Dealmeida
  • 564
  • 3
  • 8
  • Thanks a lot. I understand the process. I will go for the option to directly update the database then. It seems more flexible for now. – curuba Feb 13 '21 at 11:32
  • How can one upload the CSV data not from a url but from a local file? – asmaier Feb 25 '22 at 15:29