2

I'm curently using R package data.table to process big datasets. I'm wondering if there is a difference between the syntax

DT[,v]

and the syntax :

DT$v

if DT is my data.table object and v the variable I want to select.

I know that the dollar sign is usually used for data frames and that [,v] is always used in data.table examples. However they both work and seem to give (in my experience with 5million rows) similar times to execute.

Do you know if they are processed differently and if one is more efficient when processing even huger datasets ?

sjakw
  • 461
  • 4
  • 10
  • 2
    In `DT$v`, the `$` is processed in base R and in `DT[, v]`, the `[, v]` is processed by `[.data.table` – Rich Scriven Aug 25 '15 at 16:32
  • 1
    Matt Dowle, the author of the package, says `[[` and `$` are preferred, since they don't copy: http://stackoverflow.com/questions/18835576/class-of-data-table-column/18835813#comment27788236_18835813 – Frank Aug 25 '15 at 16:48
  • possible duplicate of [Why is it faster to evaluate in \`j\` than with \`$\` in a \`data.table\`?](http://stackoverflow.com/questions/29956250/why-is-it-faster-to-evaluate-in-j-than-with-in-a-data-table) – MichaelChirico Aug 25 '15 at 17:11
  • 3
    I think the correct way to ask a question is to first invest some effort in it by creating a reproducible example, doing some research and conduct some speed tests and then ask according to your findings. In here you are expecting us to do all the hard work for you including creating a reproducible example, testing, digging in documentation/source code, etc. This question is both useful but have no research effort which triggers me to both upvote and downvote. – David Arenburg Aug 25 '15 at 17:29

0 Answers0