5

We know that MongoDB has two ways of modeling relationships between relations/entities, namely, embedding and referencing (see difference here). Let's say we have a USER database with two tables in mySQL named user and address. An embedded MongoDB document might look like this:

{
  "_id": 1,
  "name": "Ashley Peacock",
  "addresses": [
    {
      "address_line_1": "10 Downing Street",
      "address_line_2": "Westminster",
      "city": "London",
      "postal_code": "SW1A 2AA"
    },
    {
      "address_line_1": "221B Baker Street",
      "address_line_2": "Marylebone",
      "city": "London",
      "postal_code": "NW1 6XE"
    }
  ]
}

Whereas in a referenced relation, 2 SQL tables will make 2 collections in MongoDB which can be migrated by this apporoach using pymongo.

How can we directly migrate MySQL data as an embedded document using python?

Insights about about Pseudo code and performance of algorithm will be highly useful. Something that comes to my mind is creating views by performing joins in MySQL. But in that case we will not be having the structure of children document inside a parent document.

Salvatore
  • 10,815
  • 4
  • 31
  • 69
def __init__
  • 1,092
  • 6
  • 17
  • Broadly speaking, you may query your MySQL table from Python, then iterate the result set and generate the JSON output you see above. – Tim Biegeleisen Jul 03 '22 at 06:54
  • 1
    @TimBiegeleisen may you please attach a pseudo code to do the same please? And also there is a large document we need to keep in mind. – def __init__ Jul 03 '22 at 06:58
  • Check this article from mongo https://www.mongodb.com/basics/mysql-to-mongodb – deadshot Jul 05 '22 at 08:30
  • if you put embeded docs to mongo, then you have to deal with the update of docs probably in your code, right? – Hi computer Jul 12 '22 at 03:45
  • And why use mongo, I know it is not relational-db-friendsly, cuz there are a lot of other databases too, still it depends on the business. – Hi computer Jul 12 '22 at 03:50

1 Answers1

4

Denormalization

First, for canonical reference, the question of "embedded" vs. "referenced" data is called denormalization. Mongo has a guide describing when you should denormalize. Knowing when and how to denormalize is a very common hang-up when moving from SQL to NoSQL and getting it wrong can erase any performance benefits you might be looking for. I'll assume you've got this figured out since you seem set on using an embedded approach.

MySQL to Mongo

Mongo has a great Python tutorial you may want to refer to. First join your user and address tables. It will look something like this:

| _id    | name           | address_line_1        | address_line_2 | ... 
| 1      | Ashley Peacock | 10 Downing Street ... | ...
| 1      | Ashley Peacock | 221B Baker Street ... | ...
| 2      | Bob Jensen     | 343 Main Street ...   | ...
| 2      | Bob Jensen     | 1223 Some Ave ...     | ...
...

Then iterate over the rows to create your documents and pass them to pymongo insert_one. Using upsert=True with insert_one will insert a new document if a matching one is not found in the database, or update an existing document if it is found. Using $push appends the address data to the array field addresses in the document. With this setup, insert_one will automatically handle duplicates and append addresses based on matching _id fields. See the docs for more details:

from pymongo import MongoClient

client = MongoClient(port=27017)
db = client.my_db

sql_data = []  # should have your SQL table data
# depending on how you got this into python, you will index with a  
# field name or a number, e.g. row["id"] or row[0] 

for row in sql_data:
    address = {
        "address_line_1": row["address_line_1"],
        "address_line_2": row["address_line_2"],
        "city": row["city"],
        "postal_code": row["postal_code"],
    }
    db.users.update_one(
        {"_id": row["_id"]},
        {"name": row["name"], "$push": {"addresses": address}},
        upsert=True,
    )
Salvatore
  • 10,815
  • 4
  • 31
  • 69