I'm looking for steps/libraries/approaches to solve this Problem statement.
- Given a source file of a Programming language, I need to parse it and Subdivide it into components.
Example: Given a Java File, I need to find the following in it.
- list of Imports
- Classes present in it
- Attributes in the Class
- Methods in it - along the Parameters if any. etc.
I need to extract these and store it separately. Reason Why I want to do it?
- I want to build an Inverted Index on the top of these Components.
Example queries to Inverted index 1. Find the list of files with Class name: Sample 2. Find the positions where variable XXX is used within the class AAA.
I need to support queries likes the above
So, my plan is given a file, if I build these components from it, It would be easy to build an Inverted index on the top of it.
Example: Sample -- Class - Sample.java(Keyword - Component - FileName ) I want to build an Inverted index like above.
I see it is being implemented in many IDEs like IntelliJ.What I'm interested it how much effort it would take to build something like this. And I want to try implementing the same for at least one language.
Thanks in advance.