DMVCCM.org~20080527~

   1 # -*- coding: mule-utf-8-unix -*-
   2 #+STARTUP: overview
   3 #+TAGS:
   4 #+STARTUP: hidestars
   5 #+TITLE: DMV/CCM
   6 #+AUTHOR: Kevin Brubeck Unhammer
   7 #+EMAIL: K.BrubeckUnhammer at student uva nl
   8 #+LANGUAGE: en
   9 #+SEQ_TODO: TOGROK TODO DONE
  10 #+OPTIONS: H:4 toc:3
  11
  12 * dmvccm
  13   DEADLINE: <2008-06-30 Mon>
  14 (But absolute, extended, really-quite-dead-now deadline: August 31...)
  15 [[file:src/dmv.py][dmv.py]]
  16 [[file:src/io.py][io.py]]
  17 ** TODO [#A] P_STOP
  18 [[file:src/dmv.py::DMV%20probabilities][dmv.py]]
  19 ** TOGROK P_CHOOSE
  20 ** TOGROK Initialization
  21 ** TOGROK Adjacency and combining it with inner()
  22 ** TOGROK What exactly is the E-step of DMV? Is the M-step just inner on the full sentence?
  23 *** COMMENT
  24 E: «For the initial symbols, I think we're supposed to
  25 implement someone-or-other's POS tagger and use the parts of speech as
  26 our tags. (K&M say they used parts of speech, anyway, and we can't use
  27 the ones from the annotated corpus or else we won't be unsupervised
  28 anymore.) I think you're right about how the tree is formed. The only
  29 thing I'm not sure about is whether those left and right marks are
  30 supposed to be part of the tree. I guess not because in a well-formed
  31 tree, every word needs a stop on both sides, so actually all nodes
  32 would be left and right marked in the end, which is the same as not
  33 using them at all. So I guess they're relevant in the E step of EM,
  34 but not the M step.» [[https://mail.google.com/mail/%3Ffs%3D1&tf%3D1&source%3Datom&view%3Dcv&search%3Dall&th%3D119b3e3d7d8f1d1f&shva%3D1&ui%3D1&disablechatbrowsercheck%3D1][gmail]]
  35
  36 K: About initialization, when they say they start with an M-step and get
  37 a harmonic distribution (where close dependencies have higher
  38 probabilities than far-away ones), I guess we could do this either in
  39 a function of its own, or as a special case of the IO-algorithm -- I'm
  40 not sure how different the harmonizing step will be from the regular
  41 M-step. But I'm pretty sure harmonizing means having to go through the
  42 whole corpus, since that's the only way we can find out distances for
  43 each dependency (without having actual distance-numbers in the rules
  44 in the first place). If sentence s is ABCD, rules from A to B should
  45 have higher probability than rules from A to D. Assuming we find some
  46 average distance between A and B, and A and D, etc. throughout the
  47 corpus, we then have to turn those numbers into probabilities that sum
  48 to 1; do you have any idea how to do that?
  49
  50 E: I think what it means to start with an M step is that we set the
  51 probabilities without regard for the initial corpus. They say that
  52 they start with an initial guess at "posterior distributions". I bet
  53 the word "posterior" is important, but I don't know what it means
  54 here. Do you?
  55
  56 ** Meet Yoav again about dmvccm
  57    SCHEDULED: <2008-05-26 Mon>
  58 13:30, P3.21.
  59
  60 Questions:
  61 *** Initialization
  62 *** Corpus access?
  63 *** How do we interpret DMV as an inside/outside process?
  64 c_s(x : i, j) is "the expected fraction of parses of s" with x from
  65 i to j; expectation then uses the probabilities gotten from
  66 initialization and previously gained probabilities, but these are of
  67 the form P_STOP and P_CHOOSE, how do we translate this to inside
  68 outside, which just uses the probabilities of CFG-rules?
  69 *** The upside-down P_STOP formula (left-to-right also)?
  70 *** Technical: sentences or rules as the "outer loop"?
  71 *** What are the formulas for P_CHOOSE etc?
  72 Is this the same as the regular E-step (M-step?) summation of
  73 Lari&Young?  (Equation 20)
  74 *** How is the P_STOP formula different given other values for dir and adj?
  75
  76 (Presumably, the P_STOP formula where STOP is True is just the
  77 rule-probability of _h_ -> STOP h_ or h_ -> h STOP, but how does
  78 adjacency fit in here?)
  79
  80 (And P_STOP(-STOP|...) = 1 - P_STOP(STOP|...) )
  81 *** How do we know whether we are 'adjacent' or not?
  82 **** One configuration that I'm fairly certain of: right w/CHOOSE
  83 if we have
  84 \Tree [_b [_b b _c_ ] _d_ ]
  85 then the lower tree [_b b _c_ ] is adjacent since, working your way up
  86 the tree, no argument has been created to the right "yet"; while the
  87 outer tree [_b [_b ... ] _d_ ] is non-adjacent, since there is something in
  88 between... Is it thus always adjacent to the right if the distance
  89 is 2? (That is, in e(s,t,i) for the adjacent rule: t - s == 2; while
  90 in the non_adj rule: t - s == 4)
  91 ***** Implementing this:
  92 Two different DMVRules? Or just two different prob-values per rule?
  93 **** left w/CHOOSE
  94 Same deal here?
  95 **** R/L without CHOOSE, the "sealing operations"
  96 _h_ -> STOP h_ and h_ -> h STOP
  97
  98 What is "adjacency" here? That t - s == 1?
  99
 100
 101
 102
 103
 104 * Python-stuff
 105 - [[file:src/pseudo.py][pseudo.py]]
 106 - http://nltk.org/doc/en/structured-programming.html recursive dynamic
 107 - http://nltk.org/doc/en/advanced-parsing.html
 108
 109