First of all, you need to better define what a "simple sentence" means to you from a linguistic (grammar) perspective. You can say, for example, that simple sentence are:
- just text without punctuation in the middle (periods, commas, colons, etc)
- those with a single verb. In that case you will deal with hierarchy where a sentence is "completed" by reusing another.
- a phrase-like text, where conjunctions can act as delimiters too.
In short, you have many alternative for defining this, and depending on your need your "rule" should be more (or less) rigorous because it will impact your algorithm design and (of course) your output.
I would suggest you 2 basic instructions
- split by punctuation, so you will have "simpler sentences" (e.g. your input3)
- input each of those to a dependency parser such as Spacy, and take advantage of the dependency links as delimiters.
Demo using your provided examples:
Spacy output these trees input1 and input2.
You may notice that using conj
as delimiter and merging the remaining subtrees, it returns the output you expected.
You can do the same for your input3 after split by punctuation as I mentioned above.
Finally, this is not a straightforward task, you may be fine with these simple rules, but if you need better results first improve your definitions about what a "compound' or "simple" sentence means, and have a look at more sophisticated algorithms using Machine Learning.
Although a very old question, it would be nice to know if this helps :)