1

Building SVM model

model<- svm(SeriousDlqin2yrs~., IAStrain)
predictedY <- predict(model, IAStest)
Error in names(ret2) <- rowns: 
'names' attribute [2000] must be the same length as the vector [1605]

My two data-sets (training and testing) with data types:

> str(IAStest)
'data.frame':   2000 obs. of  10 variables:
$ RevolvingUtilizationOfUnsecuredLines: num  0.106 0.503 0.111 1 1 ...
$ age : int  45 46 78 78 63 33 44 65 31 41 ...
$ NumberOfTime30.59DaysPastDueNotWorse: int  0 0 0 0 0 0 0 0 0 0 ...
$ DebtRatio : num  0.2877 0.311 0.0651 0.1255 45 ...
$ MonthlyIncome: int  10000 4912 11583 12465 NA 2500 NA 18915 8200 30018 ...
$ NumberOfOpenCreditLinesAndLoans: int  5 6 8 2 4 8 4 6 9 14 ...
$ NumberOfTimes90DaysLate: int  0 0 0 0 0 0 0 0 0 0 ...
$ NumberRealEstateLoansOrLines        : int  2 1 0 2 0 1 0 2 1 3 ...
$ NumberOfTime60.89DaysPastDueNotWorse: int  0 0 0 0 0 0 0 0 0 0 ...
$ NumberOfDependents                  : int  5 3 0 0 0 1 0 2 0 2 ...

> str(IAStrain)
'data.frame':   28000 obs. of  11 variables:
$ SeriousDlqin2yrs: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ RevolvingUtilizationOfUnsecuredLines: num  0.957 0.658 0.907 0.213 0.306
$ age  : int  40 38 49 74 57 39 27 57 30 51 ...
$ NumberOfTime30.59DaysPastDueNotWorse: int  0 1 1 0 0 0 0 0 0 0 ...
$ DebtRatio  : num  1.22e-01 8.51e-02 2.49e-02 3.76e-01 5.71e+03 ...
$ MonthlyIncome  : int  2600 3042 63588 3500 NA 3500 NA 23684 2500 6501 ...
$ NumberOfOpenCreditLinesAndLoans: int  4 2 7 3 8 8 2 9 5 7 ...
$ NumberOfTimes90DaysLate: int  0 1 0 0 0 0 0 0 0 0 ...
$ NumberRealEstateLoansOrLines: int  0 0 1 1 3 0 0 4 0 2 ...
$ NumberOfTime60.89DaysPastDueNotWorse: int  0 0 0 0 0 0 0 0 0 0 ...
$ NumberOfDependents:   int  1 0 0 1 0 0 NA 2 0 2 ...

I've read many posts on the same kind of issue. The issue was mainly with the data types of the variables. But in my case that's not an issue.

  • This issue seems to be connected to your test set. Can you try to reduce your test set, such that the error still occurs? E.g. by splitting the test set into halts and continue in the half where the error occurs. Afterwards past the output of `dput(..)` such that it is possible to reproduce. Also: which library did you use `e1071`? – CAFEBABE Mar 13 '16 at 16:54
  • Thanks for your answer. I did split the date into multiple small data frames but still getting the same error. IAStest1<- IAStest[1:100,] predictedY <- predict(model, IAStest1) Error in names(ret2) <- rowns : 'names' attribute [100] must be the same length as the vector [83] Yes, I used e1071. – Rishabh Verma Mar 13 '16 at 17:05
  • But can you add ~5 to 10 lines where the error occurs to your question? That way we can reproduce your problem. – CAFEBABE Mar 13 '16 at 17:06

1 Answers1

4

Besides my comment most likely the NA in your data are the problem

predictedY <- predict(model, IAStest[!rowSums(is.na(IAStest)),])

should generate results for the rows not containing NA values

CAFEBABE
  • 3,983
  • 1
  • 19
  • 38
  • Thanks a lot. It indeed solved the issue. NA values were the reason. If I may ask one more question. Is it a good practice to do scaling before building a model. If yes, what type of scaling would be ideal for my problem, diagonal scaling or some other scaling like logarithmic etc. – Rishabh Verma Mar 13 '16 at 17:12
  • Yes it is, recommendable see http://stackoverflow.com/questions/15436367/svm-scaling-input-values. Depending on the feature type linear or z-score scaling. However, the decision is often more an art then a science. – CAFEBABE Mar 13 '16 at 17:26
  • 1
    I highly appreciate all your help. – Rishabh Verma Mar 13 '16 at 17:41