1

I've been working on a second language development project. I need to calculate the t-unit of a given sentence using Python. For example, for the following sentences:

The man did not like water.

1 t-unit (The man did not like water)

The man did not like water although he lived by the sea.

1 t-unit (The man did not like water although he lived by the sea)

The man never liked water and he certainly did not like living in the swamp with her grandparents.

1 t-unit (The man never liked water) 1 t-unit (he certainly did not like living in the swamp with her grandparents)

The man did not like water or juice.

1 t-unit (The man did not like water or juice)

I've checked out nltk, spacy and stanford nlp (stanza) but found out that they don't include such t-unit detection at all.

I've come across this but it is about clause extraction.

Any idea how I can detect such t-units using Python?

user3288051
  • 574
  • 1
  • 11
  • 28
  • Well, the post you cited does do clause extraction, and your task is really just identifying which clauses connected by conjunctions have subject and verb. I would think that's the right place to start. – Tim Roberts Oct 28 '21 at 21:02
  • It seems like the root of the problem is trying to identify compound sentences. Could you not look for conjunctions and semicolons? You'll have to have a way to determine if the conjunction is being used as a conjunction, but that might be easier than trying to determine the clauses. – Jeff Gruenbaum Oct 28 '21 at 21:11
  • It sounds like a T-unit is roughly equivalent to an S in a constituency parse. Unfortunately constituency parsers aren't used much any more and I'm not sure there are any you can just use. Using a dependency parse to find verbs that have subjects should be pretty easy though. Check the parsing chapters in the Jurafsky and Martin book, and maybe look into sentence simplification. https://web.stanford.edu/~jurafsky/slp3/ – polm23 Oct 29 '21 at 04:14

0 Answers0