I got the data dictionary from data provider which contains hundreds vars in different word files and looks like this:
In order to add this dictionary to my current dataset, I need to convert it to certain format in Excel. For example,for first var:"intarm_actual", i would like to create columns in a spreadsheet: col of "variable" puts the left top words, col of "label" store content of "label" (for this var, it is NA, but for second var, it should be "tpe_lab"), col of "type" stors the words of " string(str2), col of "value" stores "4", col of "missing" stores "46/102", col of "tabulation" stores "46 "", 14 "RO",14 "RV",14 "TO",14 "TV"". Ideally, it should look like this:
Could anyone who happens have done this before help to provide some suggestions for this? (I appreciate for any suggestion like what package I should refer and use, any related posts article I should read, similar type of code i can learn...)Can R package "labelled" handle this type of task? Thanks a lot~~!!
update:_________________________________________________
I use package qdapTool to imported one of the docx files, it looks like this:
How can I retrieve the demanded words and assign them to right place in my spreadsheet? Thanks~~!
Update 2:--------------------------------------------
Issue has been solved in another way.
In case someone will encounter the similar situation, 1) This type of codebook file is generated by STATA; 2) Instead of reading this complex text file, the alternative solution is using package of "codebook" in R to generate the new .csv codebook which contains both these information and even more.