Graphql + Dgraph how to batch import json data?

Question

I just started a trivial graphql schema:

type Product {
    productID: ID!
    name: String @search(by: [term])
    reviews: [Review] @hasInverse(field: about)
}

type Review {
    id: ID!
    about: Product! @hasInverse(field: reviews)
    by: Customer! @hasInverse(field: reviews)
    comment: String @search(by: [fulltext])
    rating: Int @search
}

type Customer {
    custID: ID!
    name: String @search(by: [hash, regexp])
    reviews: [Review] @hasInverse(field: by)
}

Now I want to populate the DB with millions of json entries without calling the graphql mutation (too slow). For instance I have a folder full of several json files (customers and products) of the following shape.

Example of a json customer file:

{
id: "deadbeef",
name: "Bill Gates",
reviews: [
   {
      id:"1234",
      comment: "nice product"
      rating: 5,
      productId: "5678"
   }
]
}

Example of a json product file:

{
id: "5678",
name: "Bluetooth headset",
}

To what I understood, to defined edges between nodes, I first have to overload each object with an uid

The customer would become:

{
id: "deadbeef",
uid: "_:deadbeef",
...
reviews: [
   {
      id:"1234",
      uid:"_:1234",
      productId: {uid: "_:5678"}
   }
]
}

And the product

{
id: "5678",
uid: "_:5678"
...
}

Then we could batch import them (this is pure speculation, I never tried this). While this should import the entries, I would like to know how the DB would associate those entries with a type, because there is no clue yet on the data we want to insert. Is there a property like __typename I could add to each of my entries to type them?

[edit] I've found 2 possible properties class and dgraph.type still wondering which one and how I should use them

Flavien Volken · Answer 1 · 2019-11-07T06:36:28.350

The graphql schema above will generate the following predicates:

Customer.name
Customer.reviews
Product.name
Product.reviews
Review.about
Review.by
Review.comment
Review.rating
Schema.date
Schema.schema

i.e. Type.property To batch import values, there is no need to specify the type, just use the right property name.

Here is a working sample:

    const product = {
        "dgraph.type":"Product",
        "uid": "_:5678",
        "Product.name": "Bluetooth headset"
    };

    const customer = {
        "uid": "_:deadbeef",
        "dgraph.type":"Customer",
        "Customer.name": "Bill Gates",
        "Customer.reviews": [
            {                    
                "uid": "_:1234",
                "dgraph.type":"Review",
                "Review.comment": "nice product",
                "Review.rating": 5,
                "Review.by": {"uid": "_:deadbeef"},
                "Review.about": {"uid": "_:5678"}
            }
        ]
    };

    // Run mutation.
    const mu = new Mutation();
    mu.setSetJson({set: [product, customer]});

If you want to import blocks of thousands of entries, you need find a way to keep the blank ids across the transactions. To achieve this, I suggest to use a class responsible to keep the maps among the blocks imports. Here is my POC

import {DgraphClient, DgraphClientStub, Mutation} from "dgraph-js";
import * as jspb from 'google-protobuf';

type uidMap = jspb.Map<string, string>;

class UidMapper {

    constructor(private uidMap: uidMap = UidMapper.emptyMap()) {
    }

    private static emptyMap(): uidMap {
        return new jspb.Map<string, string>([]);
    }

    public uid(uid: string): string {
        return this.uidMap.get(uid) || `_:${uid}`;
    }

    public addMap(anotherMap: uidMap): void {
        anotherMap.forEach((value, key) => {
            this.uidMap.set(key, value);
        });
    }
}

class Importer {
    public async importTest(): Promise<void> {
        try {
            const clientStub = new DgraphClientStub(
                "localhost:9080",
                grpc.credentials.createInsecure(),
            );
            const dgraphClient: DgraphClient = new DgraphClient(clientStub);

            await this.createData(dgraphClient);

            clientStub.close();
        } catch (error) {
            console.log(error);
        }
    }

    private async createData(dgraphClient: DgraphClient): Promise<void> {
        const mapper = new UidMapper();

        const product = {
        "dgraph.type":"Product",
        "uid": mapper.uid("5678"),
        "Product.name": "Bluetooth headset"
        };

        const customer = ...;
        const addMoreInfo = ...;

        await this.setJsonData(dgraphClient, mapper, [product, customer]);
        await this.setJsonData(dgraphClient, mapper, [addMoreInfo]);
    }

    private async setJsonData(dgraphClient: DgraphClient, mapper: UidMapper, data: any[]) {
        // Create a new transaction.
        const txn = dgraphClient.newTxn();
        try {
            // Run mutation.
            const mu = new Mutation();

            mu.setSetJson({set: data});
            let response = await txn.mutate(mu);
            // Commit transaction.
            mapper.addMap(response.getUidsMap());
            await txn.commit();

        } finally {
            // Clean up. Calling this after txn.commit() is a no-op and hence safe.
            await txn.discard();
        }
    }
}

score 1 · Answer 2 · answered Nov 05 '19 at 14:54

1

Some points that need to be taken into account:

1 - GraphQL and GraphQL+- are completely different things.

2 - Dgraph has a type system that needs to be followed. https://docs.dgraph.io/query-language/#type-system

3 - Mutation operations on clients are not interconnected, except for Upsert operations. https://docs.dgraph.io/mutations/#upsert-block That is, setting blank_node in a mutation operation will not transfer the value assigned to it for the next mutation. You need to save the assigned UID in a variable and then use it in the next mutation.

More about mutations and blank_node https://tour.dgraph.io/master/intro/5/

4 - If you need to use the GraphQL layer, you need to read all posts and recommendations for this feature. And understand that Dgraph works one way and the GraphQL layer another way.

Continuing.

If you need to submit multiple batches in JSONs. I recommend that you use LiveLoad https://docs.dgraph.io/deploy/#live-loader. And use the -x flag. With it you can keep the mapping of UIDs for each blank node created. That is, if all entities you have have a Blank_node. It will be mapped and assigned a UID which will then be reused for every new batch via liveload.

-x, --xidmap string            Directory to store xid to uid mapping

BTW: I don't know the concept of "class" in Dgraph.

I hope it help.

Cheers.

answered Nov 05 '19 at 14:54

Michel Conrado Diz

301
2
14

Thank you for your answer, actually dgraph recently supported native GraphQL https://graphql.dgraph.io (and not Graphql+-) I already managed as to import some data (using a gRPC client) I can then query in GraphQL, but this is a POC, I need to understand how it is really working, I am quite new to dgraph. Also the class in Dgraph seems to be the predicate prefixed with a kind of class name. a graphql Customer type with a property `x` will generate a predicate `Customer.x` – Flavien Volken Nov 05 '19 at 19:45
+1 for "You need to save the assigned UID in a variable and then use it in the next mutation", as I am using the js dgraph rpc client, I will build my own `response.getUidsMap()` to keep the map across the mutations. – Flavien Volken Nov 05 '19 at 20:59
Hi Flavien. I work for Dgraph as Community Support Engineer. I mentioned the difference between Graphql+- and Graphql. Because in your question I understood that there could be confusion of concepts. Continuing - You would have to choose a layer to work with. Or the native language of Dgraph or the GraphQL layer. But surely you'll need to master both. Dgraph Clients can't work with GraphQL directly. My recommendation would be that you use Apollo Graphql solutions if you are using JS. In your clients. Cheers. – Michel Conrado Diz Nov 06 '19 at 19:46
Hi, I stated importing 2.5 millions of entries (json files) using ApolloClient and the native dgraph graphql but because of the gql overhead, the import speed decreased exponentially. I am therefore focusing on a real solution batching the json files. Thanks to you my POC is now working, I will try and bench those stuff soon. – Flavien Volken Nov 06 '19 at 20:27
1

in order to improve the performances, I tried the live loader, it is actually slower than my POC. I am now looking for the right way to use the bulk importer – Flavien Volken Nov 11 '19 at 21:16

Graphql + Dgraph how to batch import json data?

2 Answers2

Linked