1.Overview of Course, Parsing and Statistical Parsing [ppt] [pdf]
Motivation for statistical parsing, recursive phrase structure. Attachment decisions and probabilities. The Penn Treebank. Top-down, and bottom-up parsing.
- Introductory background reading: Eugene Charniak. 1997. Statistical techniques for natural language parsing. AI Magazine.
- Jurafsky and Martin, 12.0-12.3, if you don't know much syntax.
- Jurafsky and Martin, 12.4, for treebanks.
- Jurafsky and Martin, 13.0-13.3 for parsing as search
2.PCFGs and the CKY algorithm [ppt] [pdf]
PCFGs. Grammar transformations. Recursive parsing and memoization. Dynamic programming for parsing: the CKY algorithm.
- Jurafsky and Martin, sections 13.4.0-13.4.1 CKY parsing
- Jurafsky and Martin, sections 14.0-14.2, PCFGs and probabilistic CKY parsing
3.Generalized CKY parsing and unlexicalized parsing [ppt] [pdf]
Unaries and empties. Parser evaluation. Improving the context-freedom assumptions of grammars: accurate unlexicalized parsing.
- Dan Klein and Christopher D. Manning. 2001. Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank. Proceedings of the 39th Annual Meeting of the ACL, pp. 330-337.
- Dan Klein and Christopher D. Manning. 2003. Accurate Unlexicalized Parsing. ACL 2003, pp. 423-430.
4.Search in parsing and lexicalized probabilistic parsing [ppt] [pdf]
Beam parsing. Agenda-based (chart) parsing. A* parsing. Lexicalized probabilistic context-free grammars: The Charniak (1997) model.
- Dan Klein and Christopher D. Manning. 2001. Parsing and Hypergraphs. Proceedings of the 7th International Workshop on Parsing Technologies (IWPT-2001), pp. 123-134.
- Dan Klein and Christopher D. Manning. 2003. A* Parsing: Fast Exact Viterbi Parse Selection. HLT-NAACL 2003.
- Sharon A. Caraballo and Eugene Charniak. 1998. New Figures of Merit for Best-First Probabilistic Chart Parsing. Computational Linguistics 24: 275-298.
- Slav Petrov and Dan Klein. 2007. Improved Inference for Unlexicalized Parsing. North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2007).
- Eugene Charniak. 1997. Statistical parsing with a context-free grammar and word statistics. Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI 1997).
5.Treebanks and statistical parsing [ppt] [pdf]
Lexicalized parsing: Collins (1997/1999). The status of information in treebanks.
- Michael Collins. 1997. Three Generative, Lexicalised Models for Statistical Parsing. Proceedings of the 35th Annual Meeting of the ACL, Madrid.
- Michael Collins. 2003. Head-Driven Statistical Models for Natural Language Parsing
- Daniel M. Bikel. 2004. A Distributional Analysis of a Lexicalized Statistical Parsing Model. EMNLP 2004.
6.Multilingual parsing and dependency parsing [ppt] [pdf]
- Roger Levy and Christopher D. Manning. 2003. Is it harder to parse Chinese, or the Chinese Treebank?. ACL 2003, pp. 439-446.
- Eisner, Jason and Giorgio Satta (1999). Efficient parsing for bilexical context-free grammars and head automaton grammars. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 457-464.
7.Discriminative parsing [ppt] [pdf]
An introduction to discriminative parsing. Features in discriminative parsers, presented using some of Mark Johnson's slides on discriminative reranking.
- Michael Collins. 2004. Parameter Estimation for Statistical Parsing Models: Theory and Practice of Distribution-Free Methods. In Harry Bunt, John Carroll and Giorgio Satta, editors, New Developments in Parsing Technology, Kluwer.
- Ryan McDonald, Koby Crammer and Fernando Pereira. 2005. Online Large-Margin Training of Dependency Parsers. 43rd Annual Meeting of the Association for Computational Linguistics.
- Eugene Charniak and Mark Johnson. 2005. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking 43rd Annual Meeting of the Association for Computational Linguistics.