1

I'm trying to find a common route (most visited) or path from a list of visited areas by many participants in order format. Here is a toy data set looks like: In this dataset the values times are in seconds and the smallest time means first visit of that Area and so on for each participant(participantID).

ParticipantsID  Area1   Area2   Area3   Area4   Area5   Area6   Area7 Area8

part1   3.940697    5.901064    2.820492    0.00003 NA  NA  1.890461    0.00001

part2   NA  8.191393    5.510936    NA  NA  NA  NA  NA

part3   NA  NA  2.890461    0.00000 11.030156   2.460417    51.030156   2.460417

part4   NA  NA  NA  NA  0.460417    1.560417    2.460417    0.000714

part5   118.807669  40.256034   26.493948   14.99225    NA  NA  4.22576 3.78940 

part6   61.030156   2.460417    NA  NA  118.807669  40.256034   58.807669   30.256034

In my original data set has 60 participants and 12 Areas to visit. I'm not sure which statistical method is best suited for this type of analysis. Is there any R package available to use such algorithm to find a overall ordered list of areas (most visited routes in order) by analyzing all individual visits.

I would be thankful if anybody could share any idea here.

  • What is the graph of these visited areas? As far as the way you described your problem, there isn't a way to be sure about pathing as we: Don't have any idea about the order each participant took while visiting each area If areas are fully connected or not Which one is the starting area, which one is the end area What the Not Available data means for your data set (i.e. If a participant never visited an area, you could substitute NA for 0 seconds, for example). – Diego Queiroz Oct 27 '17 at 13:42
  • Thanks a lot for your quick reply. The "NA" means they have not visited the areas. Areas are fully connected. And it is free of choice for the participants to take any starting and end point. So, the only interest is their time to visit. where they visit first, second and so on.. – Masud Pervez Oct 27 '17 at 13:48
  • However, to define a path we need a starting point and end point even if they can choose it, it's an important information so you can actually know the order in which they visited each area. Also, you have the time (duration) they spent in each place not real time, if you had times tamps, you could at least infer from the earliest time stamp where they started and where they finished their path. As it is there are multiple ways in which each person could have traveled and not many ways to be sure which path they took, since they could have traveled back and forth as well. – Diego Queiroz Oct 27 '17 at 14:45
  • Well for simplicity lets assume their starting point is from Area 8 and end point is to Area 1. And also the time duration reported here are the times they have first visited that area. For example , part1 takes 0.00001 sec to first visit to Area 8, then 0.00003 sec to Area 4, then 1.890461 sec to Area 7 and so on. So the information that you are looking for that where they been on their earliest time that can easily be infer from this data. I think no need to look to the time tamps. As this is the data that they first hit that point. not their visit duration. – Masud Pervez Oct 30 '17 at 08:42
  • I would be thankful if anybody could share any idea here. – Masud Pervez Oct 31 '17 at 13:32

0 Answers0