Since I am really new in R, I am not sure if I will be able to express my problem correctly so sorry in advance. I have some letters that have a given value. I created a dataframe for those and I also have a string with the same set of letters. I want to correspond the values from the dataframe to each letter of my string and then calculate the mean for a window of length L. I can't find a way to do the first part, since I don't know how to compare the string chars with the dataframe chars and then assign the values to the string chars in order to find the mean of the window. Any tips?
A = data.frame(A = 0.429, C = -0.051, D = -2.024, E = -2.181, F = 0.836,
G = 0.158, H = -1.056, I = 0.959, K = -2.398, L = 0.658,
M = 0.470, N = -1.099, P = -0.675, Q = -1.564, R = -2.501,
S = -0.292, T = -0.182, V = 0.634, W = 0.463, Y = 0.163)
(a <- "MASEFKKKLFWRAVVAEF")
a_split = strsplit(a, "")
L = readline(prompt = "Enter window length: \n")
x = nchar(a)
for(i in 1:x-L)
{
for(j in a_split)
{
}
}
Edit 1: Okay so after your help I think I am making some progress. Sorry for the late thank you and response. I want to iterate N(sequence length) - L(window length) + 1, and thus I want N - L + 1 mean values of the windows. Then I want to correspond the mean value of each window to the most central aminoacid of each window, for example for the first 10 aminoacids the mean value of the window will be assigned to aminoacid 5, then for window 2-11 to aminoacid 6 etc.
`
A = c(A = 0.429, C = -0.051, D = -2.024, E = -2.181, F = 0.836,
G = 0.158, H = -1.056, I = 0.959, K = -2.398, L = 0.658,
M = 0.470, N = -1.099, P = -0.675, Q = -1.564, R = -2.501, S = -0.292, T = -0.182, V = 0.634, W = 0.463, Y = 0.163)
cnt = 0
(a <- "MASEFKKKLFWRAVVAEFLATTLFVFISIGSALGFKYPVGNNQTAVQDNV")
a_split = strsplit(a, "")
unlist(A)[ a_split[[1]] ]
values <- A[ a_split[[1]] ]
L=5
N = nchar(a)
print(N)
for(i in 1:N-L)
{
print(convolve(values, rep(i,i + L-1) / L, type ="filter"))
print(i/2)
cnt = cnt + 1
}
print(cnt)
`
Since I am not familiar with R I do not completely understand how convolve works and that is my main issue.
Edit 2: I think you understood correctly my question and I thank you for that. I have a sequence of N elements that I want to see if there are parts in that sequence that fit a certain criteria. For that reason, I want to have a window of length 10 to search through the sequence. For every window, the mean value will be assigned to the "central" element (I know 5.5 is mathematically the center, but rounding down here is perfect).
After all the iterations are finished, I want to see the values of each window and see if there at least L/2 elements in sequence in the results list with a positive value. For example if in results exists a subsequence like ["5" = 0.5, "6" = 2.35, "7" = 0.15, "8" = 0.35, "9" = 0.5],i.e. at least 5 elements in sequence with positive value then this part of the sequence (5-9) is possibly a transmembrane region. Of course if there are more sequentially positive values, the critera still applies. My goal is to find these regions which could possibly be transmembrane regions.
I hope I will be able to do the last part since it doesn't include convolve, which for some reason really gave me a hard time.
I am really greatful for your help!