0

I have a set of pre defined properties I want to store. For example:

PersonNr,Gender,Name,Surname, Address, Zip,City.

Now I have different sources for these data sets, which share the PersonNr but have different value for the other properties:

EXAMPLE:
From Database A I get

123456,M,Hudson,James,Fakestr 123, 12345, West City

From Database B i get

123456,M,Hudson,Jameson,Fakestr123, 12345, East City 

Instead of storing both values I want to store the data from Database A as a reference and only store the data from B that are different to A.

In my Example I would like to store something like:

Database B, Jameson, East City

What data structure can I use for the given problem?

Thanks in advance

Anh Tuan Nguyen
  • 971
  • 3
  • 13
  • 28

1 Answers1

1

The solution you choose depends a lot on the nature of your data, how you're going to store it, and what you want to do with it. If all you want is an abbreviated record that only stores the deltas, then you could write a comma-separated line that has empty fields. That is, given:

Database A
123456,M,Hudson,James,Fakestr 123, 12345, West City

Database B
123456,M,Hudson,Jameson,Fakestr123, 12345, East City

You could write a separate record showing the deltas:

123456,,,Jameson,,,East City

If you're storing the deltas in a database, then you'll probably want records that give the record identifier, the field name, and the changed value. That representation would be:

123456,Surname,Jameson
123456,City,East City

That's probably how I'd represent it in memory, too: a hash map keyed by record identifier (i.e. 123456), with a list of field name / value pairs for each ID.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351