8

I created a dataframe:

totalDeposit <- cumsum(testd$TermDepositAMT[s1$ix])

which is basically calculating cumulative sum of TermDeposit amounts in testd dataframe and storing it in totalDeposit. This works perfectly ok.

I then need to calculate the average of the deposit amount and I use the following code:

avgDeposit <- totalDeposit / (1:testd)

But I get an error message:

Error in 1:testd : NA/NaN argument
In addition: Warning message:
In 1:testd : numerical expression has 19 elements: only the first used

testd has some 8000 observations and 19 variables.

Could someone help me get past this issue? I've attempted to locate this error message online but all I have understood so far is that 1:testd basically makes R read testd as a number which it isn't and hence I get an error message. Would simply taking mean(totalDeposit) do the trick? I tried it but the figure I get is absurd and nowhere representative of the average.

Thank you for your help.

tamtam
  • 3,541
  • 1
  • 7
  • 21
Freewill
  • 413
  • 2
  • 6
  • 18

2 Answers2

12

The error message is, in this case, helpful.

When you say 1:N, what you're telling R is "give me the sequence of integers between 1 and N". It's from integer1 to integer2. testd isn't an integer, it's (at best) an entire vector of integers, and so R disposes of all but the first value in testd when calculating the sequence. The alternative would be either a horrible error or a set of sequences - one between 1 and the first value in testd, another between 1 and the second value in testd...and so on.

What you want instead is 1:nrow(testd), if testd is a data frame, and either 1:length(testd) or seq_along(testd) if it's a list or vector.

Based on the question, though - the need to calculate averages? - you're actually approaching this wrong, because you don't want a sequence of values, you just want one: since average = total/number of elements that went into that total, you just want 'the number of elements' - which can be retrieved simply with nrow(testd).

  • thank you. I also attempted dividing simply by nrow(testd) and ofcourse they values obtained are different than 1:nrow(testd). Could you please help me understand conceptually what is R doing with 1:nrow(testd) as opposed to nrow(testd). Thanks alot – Freewill Apr 19 '14 at 20:12
  • 1
    nrow(testd) is a single number while 1:nrow(testd) is a sequence of numbers starting with 1. – IRTFM Apr 19 '14 at 20:19
  • What BondedDust said. 1:nrow(testd) is best understood as "give me all the numbers between 1 and the number of rows in testd, inclusive". so if testd has 5 rows, it's 1,2,3,4,5. 3:nrow(testd) would be 3,4,5. So, dividing by nrow(testd) is simply dividing by the number of rows. dividing by 1:nrow(testd) is dividing by each integer, in turn, between 1 and the number of rows. –  Apr 19 '14 at 20:45
  • @user3007275: check out my amended answer above; nrow(testd) is what you want, if you're calculating an average. –  Apr 19 '14 at 20:47
  • thanks again, im attempting to predict an average deposit amount for each observation (customer) so i sense that the idea is to average it for each row and that is what 1:nrow(testd) will achieve as opposed to nrow(testd) which will only get one overall value. There is a book that im following to find some assistance on the coding and the book had a similar code but I didnt quite understand what they were trying to do. It makes a little more sense now. – Freewill Apr 19 '14 at 21:44
  • An average for each? That seems circular; surely each row is a single observation. The average of a single observation is...well, the value of that observation. –  Apr 19 '14 at 21:59
1

It's pretty clear that testd is a dataframe or a list since you didn't get an error from testd$. If you had a testd in which the first element were a number but it was longer than one element you would only have gotten a warning. You perhaps wanted to write:

avgDeposit <- totalDeposit / 1:nrow(testd)

... although I admit that doesn't seem very useful. At least it won't throw an error.

IRTFM
  • 258,963
  • 21
  • 364
  • 487