I am trying to determine the number of verbs (n) in each tense and aspect (TA) for each text (filename) in my dataset. I have my data saved as a tibble (see below), but the values in the n column are not yet accurate because some categories subsume others. For instance, to get an accurate count of past_simple, I need to subtract the number (n) for past_perfect and present_progressive in that same text. Essentially, I am looking to do something like this:
Subtract the value (n) of past_perfect and past_progressive from the value of past_simple for filename BIO.GO.01.1
Repeat this process for each individual file
tib <- structure(
list(
TA = c(
"past_perfect",
"past_progressive",
"past_simple",
"past_simple",
"past_simple",
"past_simple",
"past_simple",
"past_simple",
"past_perfect",
"past_progressive"
),
tense = c(
"past",
"past",
"past",
"past",
"past",
"past",
"past",
"past",
"past",
"past"
),
aspect = c(
"perfect",
"progressive",
"simple",
"simple",
"simple",
"simple",
"simple",
"simple",
"perfect",
"progressive"
),
filename = c(
"BIO.G0.01.1",
"BIO.G0.01.1",
"BIO.G0.01.1",
"BIO.G0.02.1",
"BIO.G0.02.2",
"BIO.G0.02.4",
"BIO.G0.02.5",
"BIO.G0.02.6",
"BIO.G0.03.1",
"BIO.G0.03.1"
),
discipline = c(
"BIO",
"BIO",
"BIO",
"BIO",
"BIO",
"BIO",
"BIO",
"BIO",
"BIO",
"BIO"
),
nativeness = c("NS", "NS", "NS",
"NS", "NS", "NS", "NS", "NS", "NS", "NS"),
year = c("G0",
"G0", "G0", "G0", "G0", "G0", "G0", "G0", "G0", "G0"),
gender = c("F",
"F", "F", "M", "M", "M", "M", "M", "F", "F"),
n = c(2L, 2L,
57L, 39L, 3L, 4L, 49L, 103L, 1L, 1L)
),
class = c("grouped_df",
"tbl_df", "tbl", "data.frame"),
row.names = c(NA,-10L),
groups = structure(
list(
filename = c(
"BIO.G0.01.1",
"BIO.G0.02.1",
"BIO.G0.02.2",
"BIO.G0.02.4",
"BIO.G0.02.5",
"BIO.G0.02.6",
"BIO.G0.03.1"
),
discipline = c("BIO", "BIO", "BIO", "BIO", "BIO", "BIO",
"BIO"),
nativeness = c("NS", "NS", "NS", "NS", "NS", "NS",
"NS"),
year = c("G0", "G0", "G0", "G0", "G0", "G0", "G0"),
gender = c("F", "M", "M", "M", "M", "M", "F"),
.rows = structure(
list(1:3, 4L, 5L, 6L, 7L, 8L, 9:10),
ptype = integer(0),
class = c("vctrs_list_of",
"vctrs_vctr", "list")
)
),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA,-7L),
.drop = TRUE
)
)
I know how to subtract rows from other rows based on their position, like this:
tib[3, 9] <- tib[3, 9] - tib[2, 9] - tib[1, 9]
But the rows do not always appear in this predictable order because not all TA options are present in each text (filename). I'm also not sure how to write the code to restart this process again each time it comes across a new filename.
I am still learning how to manipulate data in R. Any suggestions would be very much appreciated!