I often work with survey data at my job that comes in horribly formatted excel files that have been designed for readability and not for any data analysis. I'm looking for a way to clean the data in R and knock it into a dataframe format of variables and observations.
I know there's a load of tutorials on data cleaning in R but from my experience, they mostly deal with data that is already in a machine-readable format, so any help with this would be appreciated!
Here's a dummy example of the raw survey that has this shape:
Are you male or female?
Variable1 Variable2 Variable3 Variable4
Male n% n% n% n%
Female n% n% n% n%
How old are you?
Variable1 Variable2 Variable3 Variable4
18-34 n% n% n% n%
35+ n% n% n% n%
And so on, with blank space being empty cells/rows, the whole of each survey question being in column A a couple rows above it's corresponding data table and all questions/datatables being on one worksheet.
Is there a way to convert to this with R code?
Question Response Variable1 Variable2 Variable3 Variable4
Are you male or female? Male n% n% n% n%
Are you male or female? Female n% n% n% n%
How old are you? 18-34 n% n% n% n%
How old are you? 35+ n% n% n% n%
At the moment, I'm using some VBA code to do this in excel then reading in to R for further analysis/visualisation, but it would be nice be able to skip the excel phase and go straight to R.
Thanks!