0

I have a data table as a CSV file that I use to create metrics for a dashboard. The data table includes Metric IDs and associates these with field names. This table--this definition of metrics--is largely static, and I'd like to include it within R code rather than, for example, importing a CSV file containing these headings.

The table looks something like this:

Metric_ID Metric_Name Numerator Denominator
AB0001 Number_of_Customers No_of_Customers
AB0002 Percent_New_Customers No_of_New_Customers No_of_Customers

This has about 40 rows of data, and I'd like to set this table up in code so that it is created at the time the R query is run. I'll then use it to associate metric IDs with measures I retrive through SQL queries. Sometimes this table may change -- for example, new metrics might be added or existing metrics modified. This would need some modificatoin in the code to incorporate these metrics.

The closet way I could find was to create a data table, along the lines described in the query below.

dt<-data.table(x=c(1,2,3),y=c(2,3,4),z=c(3,4,5))
dt

   x y z
1: 1 2 3
2: 2 3 4
3: 3 4 5

cbind with data table and data frame

This works for a table with a few rows or columns, but will be unwieldy for tables with 40+ rows. For example, if I wanted to modify a metric 20 rows down, I'd have to go 20 rows down in each column, and then test the table to ensure I switched the metric at the right place in each column -- especially where some metrics have empty cells. for example, I may correct the metric ID in row 20, but accidentally put the definition (a separate column) in row 19.

Is there a more straightforward way of, in essence, creating a table in code?

(I appreciate the most straightforward way would be to keep a CSV file accessible and use read_csv to import it into R. However, this doesn't work so well if colleagues are running this query on their machine and have a different file path to the CSV -- it also raises the risk of them running the query with an out-of-date metrics table, as they may not have the latest version in their files).

Thanks in advance for any guidance you might have!

Tony

Sotos
  • 51,121
  • 6
  • 32
  • 66
T PERRY
  • 81
  • 7

1 Answers1

0

Here are two options (examples taken from respective help pages):

data.table::fread()

fread("A,B
1,2
3,4
")
#>        A     B
#>    <int> <int>
#> 1:     1     2
#> 2:     3     4

https://rdatatable.gitlab.io/data.table/reference/fread.html

tibble::tribble()

tribble(
  ~colA, ~colB,
  "a",   1,
  "b",   2,
  "c",   3
)
#> # A tibble: 3 × 2
#>   colA   colB
#>   <chr> <dbl>
#> 1 a         1
#> 2 b         2
#> 3 c         3

https://tibble.tidyverse.org/reference/tribble.html

Other options:

  • If you already have the data.frame from somewhere, you can also use dput() to get a structure() code you can paste into the files you are distributing.
  • use the reprex package https://reprex.tidyverse.org/
sthoch
  • 86
  • 5