I'm trying to change a data frame with several thousands of rows that each look like one of the following variations:
table, th, td {
border: 1px solid black
}
<table>
<tr>
<th> a </th>
<th> b </th>
<th> c </th>
</tr>
<tr>
<td> $10 x and $7 y </td>
<td> NA </td>
<td> NA </td>
</tr>
<tr>
<td> $70 a; $95 b </td>
<td> NA </td>
<td> NA </td>
</tr>
<tr>
<td> $6 j </td>
<td> NA </td>
<td> NA </td>
</tr>
</table>
and change it into this:
table, th, td {
border: 1px solid black
}
<table>
<tr>
<th> a </th>
<th> b </th>
<th> c </th>
</tr>
<tr>
<td> $10 x and $7 y </td>
<td> $10 x </td>
<td> $7 y </td>
</tr>
<tr>
<td> $70 a; $95 b </td>
<td> $70 a </td>
<td> $95 b </td>
</tr>
<tr>
<td> $6 j </td>
<td> $6 j </td>
<td> NA </td>
</tr>
</table>
This is my current code to accomplish this (I use the number of dollar signs because that is the only consistent value to determine number of transactions):
(This is formatted as a data.table, just in case that makes a difference)
df$b[(str_count(df$a, pattern = "\\$") == 2)] = unlist(strsplit(df$a, " and "))[1]
df$c[(str_count(df$a, pattern = "\\$") == 2)] = unlist(strsplit(df$a, " and "))[2]
df$b[str_count(df$a, pattern = "\\$") < 2] = df$a
Now, instead of the desired result, I get the following:
table, th, td {
border: 1px solid black
}
<table>
<tr>
<th> a </th>
<th> b </th>
<th> c </th>
</tr>
<tr>
<td> $10 x and $7 y </td>
<td> $10 x </td>
<td> $7 y </td>
</tr>
<tr>
<td> $70 a; $95 b </td>
<td> $10 x</td>
<td> $7 y</td>
</tr>
<tr>
<td> $6 j </td>
<td> $6 j </td>
<td> NA </td>
</tr>
</table>
Does anyone know how to fix this issue? I think it has to do with the fact that the strsplit()
is taking the first subsetted row and applying it to every row in the subset, but I don't know how to change it to work properly.