0

I'm looking for a way to find the transition matrix (in R) with probabilities where someone moves. This is how my df looks:

    City_year1         City_year2
   <fct>               <fct>  
 1 Alphen aan den Rijn NA     
 2 Tynaarlo            NA     
 3 Eindhoven           NA     
 4 Emmen               Emmen  
 5 Emmen               Emmen  
 6 Schagen             Schagen
 7 Bergen              NA     
 8 Schagen             Schagen
 9 Schagen             Schagen
10 Amsterdam           Rotterdam      

# .... with 200.000 more rows

How do I easily create a transition matrix with the probabilities that some one moves from Amsterdam in year 1 to Rotterdam in year 2, based on the data available in this df. Extra info: The number of unique values in year 1 is not necessarily equal to the #unique values in year 2. I have tried to use Markov functions, but without success.

I hope someone can help me!

1 Answers1

1

table(df) will give you a matrix of counts of transitions, and you can convert those counts to probabilities (proportions) with prop.table:

prop.table(table(df), margin = 1)

The margin = 1 means that probabilities in rows will sum to 1.

Using the original data in the question:

df =     read.table(text = 'City_year1         City_year2
  1 Alphen_aan_den_Rijn NA     
2 Tynaarlo            NA     
3 Eindhoven           NA     
4 Emmen               Emmen  
5 Emmen               Emmen  
6 Schagen             Schagen
7 Bergen              NA     
8 Schagen             Schagen
9 Schagen             Schagen
10 Amsterdam           Rotterdam', header = T)

result = prop.table(table(df), margin = 1)
result
# City_year2
# City_year1            Emmen Rotterdam Schagen
# Alphen_aan_den_Rijn                        
# Amsterdam               0         1       0
# Bergen                                     
# Eindhoven                                  
# Emmen                   1         0       0
# Schagen                 0         0       1
# Tynaarlo                                   

unclass(result)
# City_year2
# City_year1            Emmen Rotterdam Schagen
# Alphen_aan_den_Rijn   NaN       NaN     NaN
# Amsterdam               0         1       0
# Bergen                NaN       NaN     NaN
# Eindhoven             NaN       NaN     NaN
# Emmen                   1         0       0
# Schagen                 0         0       1
# Tynaarlo              NaN       NaN     NaN
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thankyou so much for your reply. It's a possible solution, however I seem to get a table with columns: City_year1 , City_year2 and the transition probability, but not a matrix. – Romy Schipper Nov 02 '21 at 12:23
  • Hmmm, perhaps you need to share your data more reproducibly. I read in your data using `read.table`, and I get a matrix-like result that can be converted easily to a matrix. See edits. – Gregor Thomas Nov 02 '21 at 14:41
  • @RomySchipper Use `dput` to share data in a copy/pasteable way including all class and structure information. E.g., `dput(your_data[1:10, ])` for the first 10 rows---though you might want to choose a slightly different subset that illustrates the problem better. – Gregor Thomas Nov 02 '21 at 14:43