0

I want to create an arabic dictionary database in DynamoDB. The database is primarily read-heavy, with only occasional writes to fix errors. I have 2 types of records, roots and nouns. The primary issue is that a single word can be pronounced in different ways, and can be interpreted as a root or noun, so it needs to have multiple entries. Ex:

root : ارج arija a (araj, اريج arīj) to be fragrant

noun1: ارج araj fragrance, sweet smell

noun2: ارج arij fragrant, sweet-smelling

I want to be able to (at a reasonable cost) support the following queries

  1. select all roots where root=?
  2. select all nouns where noun=?
  3. select all nouns where root=?

I'm thinking of implementing it in a single "Words" table like this: { PK int wordID (arbitrarily defined by dynamodb) string word string root bool isRoot JSON definition } Primary Key - wordID GSI1 - root, with isRoot as the sort ID GSI2 - word, with isRoot as the sort ID

Thus, query 1 and 3 could be handled by performing a query on GSI1 while specifying a sort key value of 1 and 0 respectively, and query 2 could be handled by performing a query on GSI2 with a sort key value of 0.

Is this the best way to design a DB for this type of application? I just thought it might not be optimal that I'm almost never going to do a query with my PK, as I'm only using the GSIs that I've defined.

I haven't implemented it yet but I'm considering using another type of DB on AWS if DynamoDB is not optimal for this situation where I am primarily querying for many rows at a time. The database size is approx 8k roots and 18k nouns

Manaf Asif
  • 19
  • 3
  • I don’t really understand how words, roots, and nouns relate to each other here. The same word can be a root and a noun and if a noun it can have a different root too? – hunterhacker Mar 21 '23 at 03:40

1 Answers1

0

I think it would make sense to have the root as PK of the main table and a record number (counter) as SK. Generally, dictionaries order the items within a root article by frequency of use in the natural language, so the SK record number may be a proxy for that (which would make your dictionary more convenient to use).

You can configure an LSI with isRoot as SK to address Query 1.

And then you can create a GSI with noun as PK and root as SK to deal with Query 2 & 3.

Alex Chadyuk
  • 1,421
  • 2
  • 10