4

readr::read_csv adds attributes that don't get updated when the data is edited. For example,

library('tidyverse')
df <- read_csv("A,B,C\na,1,x\nb,1,y\nc,1,z")

# Remove columns with only one distinct entry
no_info <- df %>% sapply(n_distinct)
no_info <- names(no_info[no_info==1]) 

df2 <- df %>% 
  select(-no_info)

Inspecting the structure, we see that column B is still present in the attributes of df2:

> str(df)
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':    3 obs. of  3 variables:
 $ A: chr  "a" "b" "c"
 $ B: num  1 1 1
 $ C: chr  "x" "y" "z"
 - attr(*, "spec")=
  .. cols(
  ..   A = col_character(),
  ..   B = col_double(),
  ..   C = col_character()
  .. )
> str(df2)
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':    3 obs. of  2 variables:
 $ A: chr  "a" "b" "c"
 $ C: chr  "x" "y" "z"
 - attr(*, "spec")=
  .. cols(
  ..   A = col_character(),
  ..   B = col_double(),
  ..   C = col_character()
  .. )
> attributes(df2)
$class
[1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame" 

$row.names
[1] 1 2 3

$spec
cols(
  A = col_character(),
  B = col_double(),
  C = col_character()
)

$names
[1] "A" "C"

> 

How can I remove columns (or any other updates to the data) and have the changes accurately reflected in the new data structure and attributes?

conor
  • 1,204
  • 1
  • 18
  • 22
  • 1
    out of curiosity, why do you want to do this? I understand you want attributes to be indicative of the actual tibble, but why do you care? Cheers – Khaynes Jan 02 '19 at 03:16
  • One is that it's annoying to scroll through information on non-existing columns, particularly when there's a large number of columns and you've removed them programmatically. Another is a concern of unintended consequences, such as when you don't drop factors after removing some of them, and future calculations, plots, etc, will behave differently, as if they were still there. I'm not sure of the consequences of having information included on features that no longer exist. – conor Jan 02 '19 at 03:29
  • I mean, you could just do `data.frame(df2)`. – joran Jan 02 '19 at 03:37

2 Answers2

5

You can remove column specifiction by setting it to NULL:

> attr(df, 'spec') <- NULL
> str(df)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   3 obs. of  3 variables:
 $ A: chr  "a" "b" "c"
 $ B: int  1 1 1
 $ C: chr  "x" "y" "z"
> df
# A tibble: 3 x 3
  A         B C    
  <chr> <int> <chr>
1 a         1 x    
2 b         1 y    
3 c         1 z    
mt1022
  • 16,834
  • 5
  • 48
  • 71
  • This removes the attribute completely, rather than just adjusting it to reflect the data after a manipulation. – conor Jan 02 '19 at 23:57
  • 2
    @conor Yeah. I did some search and didn't found any function that update it. However, the column specifications are not used elsewhere. It tells you how `read_csv` parsed each column during reading. AFAIK, it is safe to drop them and is unlikely to have any undesired consequences. – mt1022 Jan 03 '19 at 01:25
  • I wish it would update the attribute. I have the same issue with the `problems` attribute after importing messy data. Even after deleting the problem column, the `problem` tibble remains. Annoying. – Nova Jul 09 '19 at 13:40
0

For me this works (R version 4.0.5 (2021-03-31)):

> attr(data, "class")
[1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame" 
> attr(data, "class") <- attr(data, "class")[-1]
> attr(data, "class")
[1] "tbl_df"     "tbl"        "data.frame"