I extracted table from pdf using pdftools in r. The table in PDF has multi-line texts for the columns. I replaced the spaces with more than 2 spaces with "|" so that it's easier. But the problem I'm running into is that because of the multi-line and the way the table is formatted in the PDF, the data is coming in out of order. The original looks like this
The data that I extracted looks like this:
scale_definitions <- c("", " to lack passion easily annoyed",
" Excitable", " to lack a sense of urgency emotionally volatile",
"", " naive mistrustful",
" Skeptical", " gullible cynical",
"", " overly confident too conservative",
" Cautious", " to make risky decisions risk averse",
"", " to avoid conflict aloof and remote",
" Reserved", " too sensitive indifferent to others' feelings",
"", " unengaged uncooperative",
" Leisurely", " self-absorbed stubborn",
"", " unduly modest arrogant",
" Bold", " self-doubting entitled and self-promoting",
"", " over controlled charming and fun",
" Mischievous", " inflexible careless about commitments",
"", " repressed dramatic",
" Colorful", " apathetic noisy",
"", " too tactical impractical",
" Imaginative", " to lack vision eccentric",
"", " careless about details perfectionistic",
" Diligent", " easily distracted micromanaging",
"", " possibly insubordinate respectful and deferential",
" Dutiful", " too independent eager to please"
)
scale_definitions <- scale_definitions %>% str_replace_all("\\s{2,}", "|")
How do I best put this in dataframe?