0

A sample picture attached The sample dataset file I have a dataset (please see the attached file), in which I wish to sum the numeric column 'tdiff' based on a specific criteria, e.g. row (1 + 2), row (3 + 4) but not row (11,12,13,14). I have tried these but no luck,

xx<- chaPe [rowSums(1:2, 3:4, 11, 12, 13, 14, 15:16),]
xx<- sum(chaPe $tdiff [c(1:2, 3:4, 11, 12, 13, 14, 15:16)],)

Basically, if you look at the Column 'xsampa', only the numeric values of 'p' and 'A' in Column 'tdiff' need to be summed.

Expected result is, for e.g., row (1 +2), i.e. (0.068 + 0.011) = 0.079. Also, how does the sum affect the values in other columns, presuming they have the same values except the column 'rn' (which is not really important).

I am new to R, thus any help will be great as I cannot figure out this. Thanks.

Pranav_b
  • 19
  • 8

2 Answers2

0

You can create a new group whenever 'p' occurs so that first 2 rows form one group, next 2 another group and row 11:14 as it is. For each group we can sum the sum_tdiff value. For other columns you can decide which values you want to keep. For example, below I keep the first values for column Filename and Place.

library(dplyr)

chaPe %>%
  group_by(grp = cumsum(xsampa == 'p')) %>%
  summarise(sum_tdiff = sum(tdiff), 
            Filename = first(Filename), 
            Place = first(Place)) -> result
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • This is exactly what I need, @Ronak. Sum up the numeric Column 'tdiff' where 'p' and 'A' occurs in Column 'xsampa'. Leaving the 'p' as it is where the next row value is not followed by an 'A' but 'p' alone. Is this what you mean? Also, is there a way, I can group not only 'chaPe', but other values contained in the Column 'Filename' like 'kute' 'kutte' etc. within the same code, to make it less laboursome? Thanks. – Pranav_b Oct 17 '20 at 04:04
  • Yes, you can include multiple variables in `group_by` like this `chaPe %>% group_by(grp = cumsum(xsampa == 'p'), Filename, Consonant) %>% summarise(sum_tdiff = sum(tdiff))` – Ronak Shah Oct 17 '20 at 04:21
  • still gives me the same output. Wouldn't it be easier that whenever 'A' occurs, the row previous to it gets summed with it? – Pranav_b Oct 17 '20 at 06:56
  • @Pranav_b Your question is not clear to me. Please manually calculate first 10-12 rows of your output and update your post to include your expected output so that we know what exactly you are looking for. – Ronak Shah Oct 17 '20 at 08:09
  • My previous comment (same output) was wrong. Kindly ignore. :P – Pranav_b Oct 17 '20 at 08:46
  • More update: It seems like, R is skipping specific values, NS, RR, SS2, SS, VG in col 'Filename' when the 'xsampa' contains a complex character (t_d_, d_d, t_d_h). It doesn't happen in case where col 'xsampa' is (p , k) or (p: , k:). Why is that, @Ronak ? – Pranav_b Oct 17 '20 at 11:52
  • the reason a few rows were getting skipped is because, some values containing /k:/ in col 'xsampa', for example, had a space bar at the end. /k_/. Thus, group_by could not find those. Human error. I used 'find and replace' to rectify the same. Thanks again. God bless! – Pranav_b Oct 18 '20 at 05:53
0

Another way could be, group the data on Filename, an example is below

library(dplyr)
result <- chaPe %>%
  group_by(Filename) %>%
  summarise(sum = sum(tdiff))
 Filename             sum
  <chr>              <dbl>
1 AK_chape.TextGrid 0.0800
2 DS_chape.TextGrid 0.0844
3 MS_chape.TextGrid 0.0834
4 NS_chape.TextGrid 0.0884
5 PS_chape.TextGrid 0.0838
6 RS_chape.TextGrid 0.0877

Agaz Wani
  • 5,514
  • 8
  • 42
  • 62
  • Thanks for your suggestion, @Agaz but this will leave the rows 11,12,13,14. I wish to keep those as well. – Pranav_b Oct 17 '20 at 04:27
  • You don't need to remove the rows, you can just do it for all the rows and you will get a group sum for each group. – Agaz Wani Oct 17 '20 at 04:30