0

Let's say I'm looking for an apartment with a roommate, and I want to train/discover a model of preferences that my roommate can use to evaluate if I'll like a potential apartment without needing my input.

The dataset has some interval features (rent, bedrooms, etc.) and some nominal/categorical features (has_dishwasher, laundry).

data = """
rent,bedrooms,bathrooms,distance_to_work,has_dishwasher,laundry
3695,3,1,21,no,building
4200,4,2,27,yes,building
4500,4,1.5,19,unknown,building
4200,3,1.5,19,no,building
3800,3,1,13,no,unit
4000,3,1,8,no,unknown
4500,3,2,26,yes,building
4050,3,1,20,no,unknown
3800,3,1,13,no,unknown
"""

The preference dataset is generated from pairwise comparisons, such that if there is a path between A and B then A is considered preferable over B. If there is no path between two nodes then they can be treated as ties/incomparable.

preference DAG

I'd like to approximate my preference function for analysis, ideally in a non-black box fashion, so that I can draw conclusions like:

  • "I value in unit laundry at approximately $100 (plus/minus $10) rent"
  • "4 rooms are preferred over 3 rooms all things being equal"
  • "I prefer in unit laundry > building laundry > unknown"
  • "Adding an additional bathroom is as preferable as reducing distance_to_work by 2"
  • "It's important for the distance_to_work to be under 20, but once under 20 additional reductions aren't as important" (non-linear?)

What are some approaches that would be appropriate?

I've considered:

  • Linear regression: I would guess that some of the relationships are non-linear like in the last bullet above. Also I'm not sure how this works with categorical features.
  • Multi-criteria decision-making methods (MCDM): These often seem to be used in linear programming contexts where as per the above linear relationships seem like they will miss details.
  • Neural networks: Would probably determine the preference function, but in a black-box fashion such that analysis is difficult.
  • Elo systems: Calculating Elo then trying to train some classifier seems doable, but I'm not sure if it's the best approach given that the dataset will be small, and just because node 6 is between 9 and 4 doesn't necessarily mean that its score should be halfway between them, which I believe Elo would tend towards.
  • Ordinal regression/ranking learning: Seems like it would be more appropriate.
Sarthax
  • 73
  • 5
  • any question that is just a problem description and no code is 99% of the time a Homework question. Just try all the things you propose and choose the best. If none is good enough research more to find a method that works, SO also closes questions seeking recommendation. – rioV8 Jul 10 '23 at 09:16

0 Answers0