I am working with a CSV file that contains information in the following format:
col1 col2 col3
row1 id1 , text1 (year1) , a|b|c
row2 id2 , text2 (year2) , a|b|c|d|e
row3 id3 , text3 (year3) , a|b
...
The number of rows in the CSV is very large. The years are embedded in col2 in parentheses. Also, as can be seen col3 can have variate number of elements.
I would like to read the CSV file EFFICIENTLY and end up for each item (id) with an array as follows:
For 'item' with id#_i :
A = [id_i,text_i,year_i,101010001]
where if all possible features in col3 are [a,b,c,d,....,z], the binary vector shows its presence or absence.
I am interested in efficient implementation of this in MATLAB. Ideas are more than welcome. Thank You