I would like to understand if this is possible to populate a Superset dataset using the Superset API. I browsed the documentation and I have the feeling that we can't do it the way we upload a CSV file for example. I know the other possiblitiy is to directly write in the database but I find APIs more secure and also more easy to use and maintain.
-
1It's possible with the new import/export API. Do you want to update an existing dataset with newer data? Or create a new one from scratch? If updating an existing one, do you want to replace the data or append new data? – Beto Dealmeida Feb 09 '21 at 18:14
1 Answers
Superset 1.0 has a new import/export API that allows you to do that, it's how we're loading some of the new examples dashboards. The first thing you need to do is to enable the feature, by turning on the VERSIONED_EXPORT
feature flag in superset_config.py
:
FEATURE_FLAGS: Dict[str, Any] = {
"VERSIONED_EXPORT": True,
}
Let's assume you want to create a new dataset, with the underlying table and data. First, create a ZIP file with these two files:
# your_dataset.zip
./metadata.yaml
./datasets/examples/your_dataset.yaml
The metadata.yaml
file should look like this:
version: 1.0.0
type: SqlaTable
timestamp: '2020-12-23T23:16:56.155576+00:00'
(timestamp
can be anything)
The file your_dataset.yaml
can have any name, and should describe your columns and any additional metrics you want to define in the dataset:
table_name: your_dataset
main_dttm_col: null
description: null
default_endpoint: null
offset: 0
cache_timeout: null
schema: tap_slack
sql: null
params: null
template_params: null
filter_select_enabled: false
fetch_values_predicate: null
extra: null
uuid: 3e8130eb-0831-d568-b531-c3ce6d68d3d8 # can by any UUID
metrics:
- metric_name: count
verbose_name: COUNT(*)
metric_type: count
expression: COUNT(*)
description: null
d3format: null
extra: null
warning_text: null
columns:
- column_name: user_id
verbose_name: null
is_dttm: false
is_active: true
type: VARCHAR
groupby: true
filterable: true
expression: null
description: null
python_date_format: null
version: 1.0.0
database_uuid: 566ca280-3da8-967e-4aa4-4b349218736a # MUST be your database UUID
data: https://example.com/your_dataset.csv
Now POST
the file to the new endoint:
$ curl -X POST -F 'formData=@/path/to/your_dataset.zip' https://superset.example.com/api/v1/dataset/import/
This should create the dataset under the database of UUID 566ca280-3da8-967e-4aa4-4b349218736a, creating the table and populating with data from https://example.com/your_dataset.csv.
If you want to overwrite the data on an existing table or an existing dataset you can do a similar process, but you should also pass -F 'overwrite=true'
, in which case the existing table will be dropped and replaced by the data in the CSV URL (so it can have a new schema).
Currently there's no way to append data to an existing table through the import/export API. You could use the CSV file API to append data to an existing table, though, it has that option.

- 564
- 3
- 8
-
Thanks a lot. I understand the process. I will go for the option to directly update the database then. It seems more flexible for now. – curuba Feb 13 '21 at 11:32
-