-3

I have dataframe which has 253 rows(locations on a chromosome in Mbps) and 1 column (Allele score at each location). I need to produce a dataframe which contains the mean of the allele score at every 0.5 Mbps on the chromosome. Please help with R code that can do this. thanks.

Mike H.
  • 13,960
  • 2
  • 29
  • 39
Ahmed
  • 3
  • 1
  • 2
    Please read [How do I ask a good question?](https://stackoverflow.com/help/how-to-ask). Stack Overflow is not a code-writing service. – cmaher Mar 26 '18 at 16:42
  • Generally a minimum reproducible example would be preferred. Question: Do you have an 'interval' column? If not, can you generate one? Then you can just restructure with ddply. – SeldomSeenSlim Mar 26 '18 at 16:43

1 Answers1

0

The picture in this case is adequate to construct an answer but not adequate to support testing. You should learn to post data in a form that doesn't require re-entry by hand. (That's why you are accumulating negative votes.)

The basic R strategy would be to use cut to create a grouping variable and then use a loop construct to accumulate and apply the mean function. Presumably this is in a dataframe which I will assume is named something specific like my_alleles:

 tapply( my_alleles$Allele_score,    # act on this vector
                                     # in groups defined by this factor
         cut(my_alleles$Location, 
             breaks=seq(0, max(my_alleles$Location), by=0.5)
             ), 
                                     # with this function
         FUN=mean)
IRTFM
  • 258,963
  • 21
  • 364
  • 487