In the Maximum Entroy Markov Model (MEMM), we model the sequential tagging problem of finding a sequence of tags given a sequence of words with the following independence assumption:
Chain rule: p(t1, t2, ..., tn | x1, ..., xn) = Πi=1..np(si | s1, ..., si−1, x1, ..., xm)
Independence assumption: p(ti | t1, ..., ti-1, x1, ..., xn) = p(ti | ti-1, x1, ..., xn)
In addition, we model the conditional distribution using a log linear distribution with a feature extraction function Φ and parameters w.
1. What are the parameters of the feature extraction function Φ?
Φ(_____________________________________________________________________)
2. Write the formula of the log-linear (using the softmax function) with and parameters w:
p(ti | ti-1, x1, ..., xn) = softmax(__________________________________________)
3. Is decoding necessary in the MEMM approach? Why?