New to programming. Using R currently. Pulled in a employee data flat text file that has multi-value fields. The value options for the EmployeeClass
variable can be: A
, B
, C
, D
, E
. More than one can be selected, there can be blank values, one can be selected more than once (if they have multiple positions of the same type), and they can be in any order. The EmployeeClass
variable is a list.
Data[Employee1, EmployeeClass]
[A, B, C, D, E]
Data[Employee2, EmployeeClass]
[B, D, [blank], E]
Data[Employee3, EmployeeClass]
[C, B, A]
Data[Employee4, EmployeeClass]
[B, D, D, C]
Data[Employee5, EmployeeClass]
[E]
The data can't be adjusted up until this point, and the table has over 41,000 observations. I am trying to get the variable down to one value per observation, so I need to select the right one based on a criteria from 3 other columns' information, effectively defining a "primary" class for each employee.
Data[Employee1, EmployeeClass]
[C]
Data[Employee2, EmployeeClass]
[D]
Data[Employee3, EmployeeClass]
[A]
Data[Employee4, EmployeeClass]
[B]
Data[Employee5, EmployeeClass]
[E]
What would be the simplest way of coding this? I've attempted tidyr and grepl with some to no progress. Any help would be appreciated.
Edit: I may not have been clear. Let me rephrase. I have no issue coding the program, and because the logic required has many permutations it is probably easier for brevity's sake if I do it anyway. I am only looking for a function or package that can parse a character string, separated by commas, and then apply logic to return a comprehensible value. This character string is not always the same length as other observations in the column, some values in the string may be blank, and the sequence of values in the string is not strictly factored ascending or descending.
Most functions that I've found so far require the field length to be equal for all observations. Is there a function out there that can handle these requirements, or is it best that I go a different route? Thanks!