R Error: "In numerical expression has 19 elements: only the first used"

Question

I created a dataframe:

totalDeposit <- cumsum(testd$TermDepositAMT[s1$ix])

which is basically calculating cumulative sum of TermDeposit amounts in testd dataframe and storing it in totalDeposit. This works perfectly ok.

I then need to calculate the average of the deposit amount and I use the following code:

avgDeposit <- totalDeposit / (1:testd)

But I get an error message:

Error in 1:testd : NA/NaN argument
In addition: Warning message:
In 1:testd : numerical expression has 19 elements: only the first used

testd has some 8000 observations and 19 variables.

Could someone help me get past this issue? I've attempted to locate this error message online but all I have understood so far is that 1:testd basically makes R read testd as a number which it isn't and hence I get an error message. Would simply taking mean(totalDeposit) do the trick? I tried it but the figure I get is absurd and nowhere representative of the average.

Thank you for your help.

The expression ` a:b` demands that both `a` and `b` be scalars (i.e. single elements). Your `testd` has rather more than one. You probably want `totalDeposit/(1:length(totalDeposit))` . — Carl Witthoft, Apr 19 '14 at 19:33
thank you. I attempted yours and Ironholds response and they are similar in output. — Freewill, Apr 19 '14 at 20:13

score 12 · Accepted Answer · 2014-04-19T20:47:04.983

12

The error message is, in this case, helpful.

When you say 1:N, what you're telling R is "give me the sequence of integers between 1 and N". It's from integer1 to integer2. testd isn't an integer, it's (at best) an entire vector of integers, and so R disposes of all but the first value in testd when calculating the sequence. The alternative would be either a horrible error or a set of sequences - one between 1 and the first value in testd, another between 1 and the second value in testd...and so on.

What you want instead is 1:nrow(testd), if testd is a data frame, and either 1:length(testd) or seq_along(testd) if it's a list or vector.

Based on the question, though - the need to calculate averages? - you're actually approaching this wrong, because you don't want a sequence of values, you just want one: since average = total/number of elements that went into that total, you just want 'the number of elements' - which can be retrieved simply with nrow(testd).

edited Apr 19 '14 at 20:47

answered Apr 19 '14 at 19:45

thank you. I also attempted dividing simply by nrow(testd) and ofcourse they values obtained are different than 1:nrow(testd). Could you please help me understand conceptually what is R doing with 1:nrow(testd) as opposed to nrow(testd). Thanks alot – Freewill Apr 19 '14 at 20:12
1

nrow(testd) is a single number while 1:nrow(testd) is a sequence of numbers starting with 1. – IRTFM Apr 19 '14 at 20:19
What BondedDust said. 1:nrow(testd) is best understood as "give me all the numbers between 1 and the number of rows in testd, inclusive". so if testd has 5 rows, it's 1,2,3,4,5. 3:nrow(testd) would be 3,4,5. So, dividing by nrow(testd) is simply dividing by the number of rows. dividing by 1:nrow(testd) is dividing by each integer, in turn, between 1 and the number of rows. – Apr 19 '14 at 20:45
@user3007275: check out my amended answer above; nrow(testd) is what you want, if you're calculating an average. – Apr 19 '14 at 20:47
thanks again, im attempting to predict an average deposit amount for each observation (customer) so i sense that the idea is to average it for each row and that is what 1:nrow(testd) will achieve as opposed to nrow(testd) which will only get one overall value. There is a book that im following to find some assistance on the coding and the book had a similar code but I didnt quite understand what they were trying to do. It makes a little more sense now. – Freewill Apr 19 '14 at 21:44
An average for each? That seems circular; surely each row is a single observation. The average of a single observation is...well, the value of that observation. – Apr 19 '14 at 21:59

score 1 · Answer 2 · answered Apr 19 '14 at 20:17

It's pretty clear that testd is a dataframe or a list since you didn't get an error from testd$. If you had a testd in which the first element were a number but it was longer than one element you would only have gotten a warning. You perhaps wanted to write:

avgDeposit <- totalDeposit / 1:nrow(testd)

... although I admit that doesn't seem very useful. At least it won't throw an error.

R Error: "In numerical expression has 19 elements: only the first used"

2 Answers2

Linked