Your second column is (I think) a character vector. strsplit
, as it mentions in the documentation (?strsplit
) returns a list. Before we get into why your specific situation happened, some general advice:
- Make a new column instead of replacing an existing one. This has the added benefit of not losing the original values.
- Only replace values in a column with new values of the same class (e.g., character for character, integer for integer).
So I suggest adding a new column of split values:
letters[["splits"]] <- strsplit(letters[[2]], split = "|", fixed = TRUE)
You now have a list column, and each row of this column has a vector of the split letters from the original values.
Why your problem happened
Let's dissect the assignment statement:
letters[i,2] <- strsplit(letters[i,2], split = "[|]")
On the left side of <-
is letters[i, 2]
, which is a data.frame
. A data.frame
stores all of its data in a list. R allows us to use this fact, especially in assignment. We can add or replace columns just like adding or replacing items in a list.
# This...
letters[, "one"] <- 1
letters[, "two"] <- 2
# is effectively the same as this
letters[, c("one", "two")] <- list(1, 2)
To the right of ->
, we have a call to strsplit()
, which returns a list
. As in the example just above, if you assign a list to a subset of a data.frame
, it will be coerced into a data.frame
itself. Each element of the list will be considered a column. So, the assignment plays out like this:
- If
letters[i,2]
is "A|B|C|D|E"
, then strsplit(letters[i,2], split = "[|]")
is list(c("A", "B", "C", "D", "E"))
.
- The assignment checks both sides, and sees the
data.frame
as a the "higher" type, so it coerces the list to a data.frame. The right side is now effectively data.frame(c("A", "B", "C", "D", "E"))
.
- Now it tries to assign a
data.frame
with 1 column and 5 rows to a subset with 1 column and 1 row. Those dimensions don't match, so it takes what it can from the right side (just the first row) and warns you about what happened.
Why the suggested assignment works
So why isn't there any coercion in this?
letters[["splits"]] <- strsplit(letters[[2]], split = "|", fixed = TRUE)
The left side uses [[
subsetting (treating the data.frame
like a list) to add or replace the "splits"
column. So no coercion is ever done.
Also, a data.frame
can have a list
as a column, just like a list
can have a list
as an element. A data.frame
column just has to satisfy two things:
- It has to be a vector.
- Its length must be equal to the number of rows in the
data.frame
(recycling's attempted if necessary).
A list
is a type of vector. And strsplit()
returns a list the same length as its input, so both criteria are met.