Could someone please guide me on how to extract a .docx
file and load it onto a database using an ETL
(Extract-Transform-Load) or ELT
(Extract-Load-Transform) tool?
Assuming that the .docx
file contains mostly unstructured data, isn't it an ELT
tool I should go for instead of ETL
?
The ETL
and ELT
tools I found this far didn't support the MS Word
component. What other way is there to extract and store the content in a .docx
file onto a database?
My requirement is to:
- Extract the data inside the
.docx
file, - Convert them into meaningful data, and
- Store them onto a
data lake
so I can performdata analysis
, and take productive decisions based on those results.
It's just like how e-commerce companies convert customer reviews into meaningful data so they can take decisions to boost their sales. In my case, it's Word
files I need to analyze.
I'm asking this because I've searched for so many ETL
and ELT
tools but couldn't find anything that supported Word
files. Maybe it's because I haven't been searching for the right tool or the right way to do it?
If somebody knows a way, please guide me through the process. What should I start looking for? A tool, or a way to code the entire thing?
I've been looking for an answer for weeks now but didn't find a helpful answer. And it's starting to get really frustrating to see all the tools supporting every other component like social media, MongoDB, or whatever EXCEPT Word
files.