1

This might be more of a math problem, but I couldn't find any relevant document elsewhere.

I just want to figure out which equation is used to calculate alignment score in GIZA++.

Might anyone have an idea?

Thank you for your help in advance.

Dima Chubarov
  • 16,199
  • 6
  • 40
  • 76
CosmicRabbitMediaInc
  • 1,165
  • 4
  • 21
  • 32

2 Answers2

4

If it helps, I found this document, which includes the following description:

Implements full IBM-4 alignment model with a dependency of word classes as described in (Brown et al. 1993)

Following up that reference leads to a paper entitled "The Mathematics of Statistical Machine Translation: Parameter Estimation", which you can find in PDF format here.

The paper gives details of the math underlying the 5 alignment models and is too verbose to paste here. Perhaps you can see if this is sufficiently detailed in its description of Model 4, which is what I assume is used by GIZA++.

There is also this PDF, which summarises the models and training process.

Roger Rowland
  • 25,885
  • 11
  • 72
  • 113
0

In short, word alignments and translation probabilities are learned in multiple iterations of Expectation Maximum algorithm.

The "Statistical Machine Translation" of Philip Koehn has a chapter for word alignments. Check statmt.org for more information.

Jokester
  • 5,501
  • 3
  • 31
  • 39