The process of creating knowledge graphs

Question

So I am new to the world of the semantic web, RDF's and Ontologies How does the process of creating knowledge graphs work? Say I want to create knowledge graphs about a specific team and link everything about it the players, trophies and everything how will it go? Do I first scrape data about the team? Do I convert from CSV to RDF triples. And where do Data Science, NLP and Machine Learning fall into all this?

score 5 · Accepted Answer · answered Dec 13 '22 at 20:55

Ok, there are a few components to this. I will take each in turn.

Part 1:

So I am new to the world of the semantic web, RDF's and Ontologies How does the process of creating knowledge graphs work? Say I want to create knowledge graphs about a specific team and link everything about it the players, trophies and everything how will it go?

Some high-level steps:

Design an ontology to represent the knowledge in your knowledge graph. The ontology represents the classes, which will be populated with instances. In your case a class could be players and an instance could be a player in your team. The players class could be linked to the trophies class to show which players have won trophies. This guide might prove useful
Procure data to populate your ontology. I don't have domain knowledge of your area, but web data sounds like it could work.
Find an appropriate database to store your graph. Based on the tags, it sounds like you want to use RDF - Virtuoso, GraphDB and Marklogic all offer free versions you can run locally.
Ingest your data. RDF graphs CRUD operations can be executed using SPARQL. Take a look at the SPARQL INSERT operation. There are also more complex frameworks for turning data into knowledge graphs.

However, given your use-case I would ignore everything I've written above as this sounds like a solved problem. See, the beauty of RDF is that there is a big community of open data and shared ontologies. It is likely the graph you want to create could at least partially be sourced from existing public graphs which already aggregate and crowd-source data from the web.

See the SPARQL endpoints:

Part 2

Do I first scrape data about the team? Do I convert from CSV to RDF triples.

I would avoid scraping if you can, and try to rely on the above public graphs that already exist. However, scraping is an option if required.

Part 3

And where do Data Science, NLP and Machine Learning fall into all this?

Increasingly knowledge graphs are being used as part of machine learning workflows. There are a few reasons for this:

graphs provide a rich and highly connected web of data. Having more context is generally thought to result in better models as feature variables are richer.
data in a graph can be extracted at a specified granularity, so it is possible to solve a variety of downstream use-cases, whilst retaining semantic meaning.
This rise of trained models using graph neural networks is fuelling the increasing adoption of knowledge graphs.
Modern machine learning requires increasing amounts of data, the likes of which can only be found on the web. RDF has a long-history of aggregating web data in public knowledge graphs.

No worries, there are lots of answers on SO about using dbpedia. Do shout if you get stuck — HES, Dec 14 '22 at 10:22

Timea · Answer 2 · 2023-01-02T10:16:46.153

A really great answer was shared already. I want to share my experience because for people who need to get going with such a task, it can be really overwhelming at first. Some advice:

Start simple

Start with a whiteboard or pen and paper even. Like when designing a database, draw your types of data and how you want to have them connected -> this will show you the ontology.

Create your own ontology first

Usually, one should reuse existing ontologies BUT that can be so daunting that one will easily just get stuck. From your previous task, you will already see some relations emerging so just go with it. Example: player won trophy -> 'won' could be a predicate in your ontology.

Think about your use cases/features

The data you want to represent might get you started, but soon, you will not know what other 'relations' your data should have. Turn to your app features/requirements and use cases for more. Example: your use case is to list the trophies of each team -> this hints at the need for each player to have a 'team'. You can make it a class or an attribute of a player. If it is a class, it will be easy to SPARQL out questions like: 'how many teams do I have?'

Find here a description of the process while I was modeling example code applications for a search use case. To keep it simple, I like to start with SKOS to represent my knowledge.

The process of creating knowledge graphs

2 Answers2