-3

I want to create a new id column for a real estate dataset. The data is about land ownership titles in the UK. Each observations is a property "unit" that has its own postbox. I only collect units that are owned by companies.

I want the new id column to be a "speaking id", meaning that it should reveal the most important info about the observation, while at the same time being concise and obviously unique. It should then be a composite key.

What columns should I include? What are some best practices for this kind of task (specifically for real estate data)?

Right now my main columns are the following: date (when company became proprietor of the land title), company's name, street, street number, locality, district, title number (alpha-numeric code as reported in the original data source).

I haven't found best practices for this kind of task, only a couple of examples. For US real estate data, one website suggested creating a composite id with the following info: Country, State (FIPS), County (FIPS), Subcounty (FIPS), Parcel Number, Property Type, Sub Property.

I am working in R but I think this issue might be more common for sql users.

  • 1
    Hi - please can you explain your meaning/understanding of an “id column” and why you want to duplicate lots of information that already exists in a record in this new column? What do you plan to use this column for? – NickW Aug 28 '23 at 08:15
  • Hi - by id column I mean a unique identifier for each observation. I plan to use such a composite id to see at a glance the main info for a certain observation. – askerofquestions2k23 Aug 28 '23 at 08:25
  • Obviously up to you, but trying to stuff a load of information into a composite key so that you don’t need to look at the other columns in the row sounds like a really bad design decision to me. If you want a key to be composite then it needs to consist of the minimal list of columns that will uniquely identify each record - so this isn’t a judgement-type decision, it’s a correct/incorrect decision that can only be made by someone who understands your data. However, if the uniquely identifying list of columns is long it’s almost certainly better to create the id as just a sequential number – NickW Aug 28 '23 at 09:01

0 Answers0