1 # -*- coding: mule-utf-8-unix -*-
6 #+AUTHOR: Kevin Brubeck Unhammer
7 #+EMAIL: K.BrubeckUnhammer at student uva nl
9 #+SEQ_TODO: TOGROK TODO DONE
13 DEADLINE: <2008-06-30 Mon>
14 (But absolute, extended, really-quite-dead-now deadline: August 31...)
15 [[file:src/dmv.py][dmv.py]]
16 [[file:src/io.py][io.py]]
18 [[file:src/dmv.py::DMV%20probabilities][dmv-P_STOP]]
19 *** Remember: The P_STOP formula is upside-down (left-to-right also)
20 *** How is the P_STOP formula different given other values for dir and adj?
21 (Presumably, the P_STOP formula where STOP is True is just the
22 rule-probability of _h_ -> STOP h_ or h_ -> h STOP, but how does
23 adjacency fit in here?)
25 (And P_STOP(-STOP|...) = 1 - P_STOP(STOP|...) )
27 Write the formulas! should be easy?
28 ** TOGROK Some (tagged) sentences are bound to come twice :OPTIMIZE:
29 Eg, first sort and count, so that the corpus
30 [['nn','vbd','det','nn'],
31 ['vbd','nn','det','nn'],
32 ['nn','vbd','det','nn']]
34 [(['nn','vbd','det','nn'],2),
35 (['vbd','nn','det','nn'],1)]
36 and then in each loop through sentences, make sure we handle the
39 Is there much to gain here?
41 [[file:~/Documents/Skole/V08/Probability/dmvccm/src/dmv.py::Initialization%20todo][dmv-inits]]
42 we Do have to go through the corpus, since the probabilities are based
43 on how far away in the sentence arguments are from their heads.
44 *** DONE DMV Initialization frequencies
45 CLOSED: [2008-05-27 Tue 20:04]
47 P_STOP is not well defined by K&M. One possible interpretation given
48 the sentence [det nn vb nn] is
49 - f_STOP( STOP|det, L, adj) +1
50 - f_STOP(-STOP|det, L, adj) +0
51 - f_STOP( STOP|det, L, non_adj) +1
52 - f_STOP(-STOP|det, L, non_adj) +0
53 - f_STOP( STOP|det, R, adj) +0
54 - f_STOP(-STOP|det, R, adj) +1
56 - f_STOP( STOP|nn, L, adj) +0
57 - f_STOP(-STOP|nn, L, adj) +1
58 - f_STOP( STOP|nn, L, non_adj) +1 # since there's at least one to the left
59 - f_STOP(-STOP|nn, L, non_adj) +0
62 Go through the corpus, counting distances between heads and
63 arguments. In [det nn vb nn], we give
64 - f_CHOOSE(nn|det, R) +1/1 + C
65 - f_CHOOSE(vb|det, R) +1/2 + C
66 - f_CHOOSE(nn|det, R) +1/3 + C
67 - If this were the full corpus, P_CHOOSE(nn|det, R) would have
68 (1+1/3+2C) / sum_a f_CHOOSE(a|det, R)
70 The ROOT gets "each argument with equal probability", so in a sentence
71 of three words, 1/3 for each (in [nn vb nn], 'nn' gets 2/3). Basically
72 just a frequency count of the corpus...
73 *** TOGROK DMV Initialization probabilities from initialization frequency
74 *** TOGROK CCM Initialization
75 P_SPLIT used here... how, again?
76 ** TOGROK Adjacency and combining it with inner()
77 ** TOGROK Is the E-step in DMV just inner() on the full sentence?
78 ** TOGROK What exactly is the M-step of DMV?
80 E: «For the initial symbols, I think we're supposed to
81 implement someone-or-other's POS tagger and use the parts of speech as
82 our tags. (K&M say they used parts of speech, anyway, and we can't use
83 the ones from the annotated corpus or else we won't be unsupervised
84 anymore.) I think you're right about how the tree is formed. The only
85 thing I'm not sure about is whether those left and right marks are
86 supposed to be part of the tree. I guess not because in a well-formed
87 tree, every word needs a stop on both sides, so actually all nodes
88 would be left and right marked in the end, which is the same as not
89 using them at all. So I guess they're relevant in the E step of EM,
90 but not the M step.» [[https://mail.google.com/mail/%3Ffs%3D1&tf%3D1&source%3Datom&view%3Dcv&search%3Dall&th%3D119b3e3d7d8f1d1f&shva%3D1&ui%3D1&disablechatbrowsercheck%3D1][gmail]]
92 K: About initialization, when they say they start with an M-step and get
93 a harmonic distribution (where close dependencies have higher
94 probabilities than far-away ones), I guess we could do this either in
95 a function of its own, or as a special case of the IO-algorithm -- I'm
96 not sure how different the harmonizing step will be from the regular
97 M-step. But I'm pretty sure harmonizing means having to go through the
98 whole corpus, since that's the only way we can find out distances for
99 each dependency (without having actual distance-numbers in the rules
100 in the first place). If sentence s is ABCD, rules from A to B should
101 have higher probability than rules from A to D. Assuming we find some
102 average distance between A and B, and A and D, etc. throughout the
103 corpus, we then have to turn those numbers into probabilities that sum
104 to 1; do you have any idea how to do that?
106 E: I think what it means to start with an M step is that we set the
107 probabilities without regard for the initial corpus. They say that
108 they start with an initial guess at "posterior distributions". I bet
109 the word "posterior" is important, but I don't know what it means
111 ** TODO Corpus access
112 ** TOGROK How do we interpret DMV as an inside/outside process?
113 c_s(x : i, j) is "the expected fraction of parses of s" with x from
114 i to j; expectation then uses the probabilities gotten from
115 initialization and previously gained probabilities, but these are of
116 the form P_STOP and P_CHOOSE, how do we translate this to inside
117 outside, which just uses the probabilities of CFG-rules?
119 ** Technical: sentences or rules as the "outer loop"?
121 ** DONE How do we know whether we are 'adjacent' or not?
122 **** One configuration that I'm fairly certain of: right w/CHOOSE
124 \Tree [_b [_b b _c_ ] _d_ ]
125 then the lower tree [_b b _c_ ] is adjacent since, working your way up
126 the tree, no argument has been created to the right "yet"; while the
127 outer tree [_b [_b ... ] _d_ ] is non-adjacent, since there is something in
128 between... Is it thus always adjacent to the right if the distance
129 is 2? (That is, in e(s,t,i) for the adjacent rule: t - s == 2; while
130 in the non_adj rule: t - s == 4)
131 ***** Implementing this:
132 Two different DMVRules? Or just two different prob-values per rule?
135 **** R/L without CHOOSE, the "sealing operations"
136 _h_ -> STOP h_ and h_ -> h STOP
138 What is "adjacency" here? That t - s == 1?
144 - [[file:src/pseudo.py][pseudo.py]]
145 - http://nltk.org/doc/en/structured-programming.html recursive dynamic
146 - http://nltk.org/doc/en/advanced-parsing.html