Parsing is the process of taking a string and a grammar and returning a parse tree for that string according to that grammar.

Parsing strategies:

  • Top down
    • Start with the top of the tree and expand the active node with all possible rules of the context-free grammar
  • Bottom-up
    • Start at the bottom: the word /part of speech pairs given in the lexicon

Cocke Kasami Younger (CKY) Parsing

CYK Parsing

  • Bottom-up parsing
  • Systematically fill in a table with solutions to the subproblems (so Dynamic Programming too)
  • First, we need to make sure the CFG is in a specific format: CNF

Chomsky Normal Form (CNF)

CNF

  • CNF grammars are at most binary branching
    • Max. 2 symbols on the right-hand side of any rule (so, not VP →V NP NP)
  • No mixing of non-terminals and terminals (= words!) on the right-hand side
  • No “unit productions”: rules that have one non-terminal on the right-hand side (VP → V)

Any CFG can be rewritten into Chomsky-Normal Form automatically.

From Non-CNF to CNF Form: HOW?

  • We introduce a new dummy non-terminals into the grammar so that A B C D turns into:
    • A X D
    • X B C
  • Unit productions: A B and B word turns into A word

Where symbol X doesn’t occur anywhere else in the grammar.

Note to self: continue from slide 29