how to extract pdf, doc and image data and how to embed that data and how to store that into vecotrdb.
I need examples of code. how to do that using python and is there open-source platform to do practice.
sample python script expecting especially using langchain.