Machine Learning | Posted by 알 수 없는 사용자 2008. 9. 2. 14:32

Useful background knowledge for Machine Learning

Machine Learning/CRF | Posted by 알 수 없는 사용자 2008. 8. 29. 09:38

CRF++

Learning :
crf_learn template_file train_file model_file

crf_learn -f 2 template_file train_file model_file   # -f NUM : threshould for the feature
crf learn -a MIRA template_file train_file model_file  # -a MIRA : learn with MIRA
crf_learn -c 1.5 template_file train_file model_file # -c float : balance between overfitting and underfitting

Test:
crf_test -m model_file test_file

crf_test -n 20 -m model_file test_file  # -n NUM : N-best results
Machine Learning | Posted by 알 수 없는 사용자 2008. 7. 15. 13:12

semi-supervised learning -- mitchael

Machine Learning | Posted by 알 수 없는 사용자 2008. 7. 8. 01:38

Category of Machine Learning

Machine learning algorithms are organized into a taxonomy, based on the desired outcome of the algorithm. Common algorithm types include:

  • Supervised learning — in which the algorithm generates a function that maps inputs to desired outputs. One standard formulation of the supervised learning task is the classification problem: the learner is required to learn (to approximate) the behavior of a function which maps a vector into one of several classes by looking at several input-output examples of the function.
  • Unsupervised learning — An agent which models a set of inputs: labeled examples are not available.
  • Semi-supervised learning — which combines both labeled and unlabeled examples to generate an appropriate function or classifier.
  • Reinforcement learning — in which the algorithm learns a policy of how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback that guides the learning algorithm.
  • Transduction — similar to supervised learning, but does not explicitly construct a function: instead, tries to predict new outputs based on training inputs, training outputs, and test inputs which are available while training.
  • Learning to learn — in which the algorithm learns its own inductive bias based on previous experience.

The computational analysis of machine learning algorithms and their performance is a branch of theoretical computer science known as computational learning theory.

Machine Learning | Posted by 알 수 없는 사용자 2008. 4. 15. 22:37

machine learning video lectures

Machine Learning | Posted by 알 수 없는 사용자 2008. 4. 15. 09:45

a short course on graphical models

I designed this course while I was an intern at the Intel Berkeley Research Center during the summer of 2003. If you find the slides useful, you are welcome to use them (with proper credit). Please let me know if you find any typos or errors.

A Short Course on Graphical Models

Mark A. Paskin


This course covers the basics of graphical models, which are powerful tools for reasoning under uncertainty in large, complex systems. The course assumes little or no mathematical background beyond set theory, and no background knowledge of Probability Theory. The emphasis is on presenting a set of tools that are useful in a large number of applications, and presenting these tools in a rigorous but intuitive way.

The course has three lectures, each of which can be presented at a high level in 90 minutes or split into two 60 minute sessions for more depth.

Lectures Slides
1. Introduction to Probability Theory Motivation, probability spaces, axioms of probability, conditional probability, product rule, chain rule, Bayes' rule, random variables, densities, table densities, Gaussians, marginalization and conditioning, inference. [1, Ch. 1; 5; 2, Ch. 13; 3, Ch. 13; 9]
2. Structured Representations Independence, conditional independence, Bayesian networks, the Bayes Ball algorithm, Markov Random Fields, the Hammersley-Clifford Theorem, moralization, Variable Elimination, NP and #P hardness of inference. [3, Ch. 2; 5; 6; 2, Ch. 14]
3. The junction tree algorithms Junction trees, the Shafer-Shenoy algorithm, its relation to Variable Elimination, the HUGIN algorithm, its relation to Shafer-Shenoy, the Viterbi algorithm, Generalized Distributive Law, triangulation, elimination. [3, Ch. 17; 4]

References for further study
[1] D. Bertsekas and J. Tsitsiklis (2002). Introduction to Probability. Athena Scientific, Belmont, Mass. (First chapter online.)
[2] S. Russell and P. Norvig (2003). Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs, NJ.
[3] M. I. Jordan (2003). Introduction to Graphical Models. (Forthcoming.)
[4] R. Cowell, A. Dawid, S. Lauritzen, D. Spiegelhalter (1999). Probabilistic Networks and Expert Systems. Springer, New York, NY.
[5] J. Pearl (1997). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco, CA.
[6] R. Shachter (1998). Bayes-Ball: The Rational Pasttime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams). In Gregory F. Cooper and Serafmn Moral (editors), Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann1 San Francisco, CA.



http://www-anw.cs.umass.edu/~cs691t/
Machine Learning | Posted by 알 수 없는 사용자 2008. 2. 15. 12:38

Extra Readings

[AFDJ03] An introduction to MCMC for machine learning

by C. Andrieu, N. de Freitas, A. Doucet and M. I. Jordan.

Machine Learning, 2003.
[B98] A Tutorial on Support Vector Machines for Pattern Recognition

by Chris Burges.

KDDM, 1998.
[BBBCL07] Robust Reductions from Ranking to Classification

by Nina Balcan, Nikhil Bansal, Alina Beygelzimer, Don Coppersmith, John Langford, and Greg Sorkin.

COLT 2007.
[BDHLZ05] Reductions Between Classification Tasks

by Alina Beygelzimer, Varsha Dani, Tom Hayes, John Langford and Bianca Zadronzny.

ICML, 2005.
[BKNS04] Policy search by dynamic programming

by J. Andrew Bagnell, Sham Kakade, Andrew Y. Ng and Jeff Schneider.

NIPS 2004.
[BNJ03] Latent Dirichlet allocation

by Dave Blei, Andrew Ng and Michael Jordan.

JMLR, 2003. (You can ignore Section 5 (Inference and Parameter Estimation)).
[GE03] An Introduction to Variable and Feature Selection

by Isabelle Guyon and Andre Elisseeff.

JMLR 2003.
[GS04] Finding scientific topics

by Tom Griffiths and Mark Steyvers.

PNAS, 2004.
[J04] Graphical models

by Michael I. Jordan.

Statistical Science 2004.
[KKJ03] Exploration in Metric State Spaces

by Sham Kakade, Michael Kearns, and John Langford.

ICML 2003.
[KSD06] Learning Low-Rank Kernel Matrices

by Brian Kulis, Matyas Sustik, Inderjit Dhillon.

ICML 2006.
[L03] Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

by Neil Lawrence.

NIPS 2003.
[L05] Tutorial on Practical Prediction Theory for Classification

by John Langford.

JMLR 2005.
[M03] Simplified PAC-Bayesian Margin Bounds

by David McAllester.

COLT 2003.
[MPKWJW05] Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE

by R. McDonald, F. Pereira, S. Kulick, S. Winters, Y. Jin, and P. White.

ACL 2005.
[N06] Linear algebra review and reference

by Andrew Ng.

Draft tutorial, 2006.
[NG00] PEGASUS: A policy search method for large MDPs and POMDPs

by Andrew Y. Ng and Michael I. Jordan.

UAI 2000.
[NMM06] Semi-supervised Text Classification Using EM

by Kamal Nigam, Andrew McCallum and Tom Mitchell.

In Semi-supervised Learning, 2006.
[PS07] Policy Gradient Methods for Robotics

by Jan Peters and Stefan Schaal.

IROS 2006.
[Q86] Induction of Decision Trees

by J.R. Quinlan.

MLJ, 1986.
[S99] Perceptron, Winnow, and PAC Learning

by R. Servedio.

COLT 1999.
[SB98] Reinforcement Learning: An Introduction

by Rich Sutton and Andrew Barto.

MIT Press, 1998.
[SM06] An Introduction to Condition Random Fields for Relational Learning

by Charles Sutton and Andrew McCallum.

Book Chapter in Introduction to Statistical Relational Learning, 2006.
[SWHSL06] Spectral methods for dimensionality reduction

by L. Saul, K. Weinberger, J. Ham, F. Sha and D. Lee.

In "Semisupervised learning" 2006.
[T08] Dirichlet Processes

by Yee Whye Teh.

Draft tutorial, 2008.
[TDR07] Bayesian Agglomerative Clustering with Coalescents

by Yee Whye Teh, Hal Daumé III and Daniel Roy

NIPS 2007.
[WBS06] Distance Metric Learning for Large Margin Nearest Neighbor Classification

by Kilian Weinberger, John Blitzer and Lawrence Saul.

NIPS 2006.
[WSZS07] Graph Laplacian methods for large-scale semidefinite programming, with an application to sensor localization

by Kilian Weinberger, Fei Sha, Qihui Zhu and Lawrence Saul.

NIPS 2007.
[ZGL03] Semi-supervised learning using Gaussian fields and harmonic functions

by Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty.

ICML 2003.
Machine Learning | Posted by 알 수 없는 사용자 2008. 2. 14. 16:53

Generative and Discriminative Approaches for Graphical Models

CRF and Perceptron approaches

[LafMcCPer01]
Authors: John Lafferty, Andrew McCallum, Fernando Pereira
Title: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.
Proceedings: International Conference on Machine Learning (ICML-2001), 2001.
Presenter:Karthik

[LafZhuLiu04]
Authors: John Lafferty, Xiaojin Zhu, Yan Liu
Title: Kernel Conditional Random Fields: Representation and Clique Selection
Proceedings: International Conference on Machine Learning (ICML-2004), 2004.
Presenter:Karthik

[Collins02]
Author: Michael Collins
Title: Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms.
Proceedings: EMNLP 2002.
Presenter: Ozgur

SVM approaches

[TsoJoaHofAlt05]
Authors: I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun
Title:
Large Margin Methods for Structured and Interdependent Output Variables,
Journal: Journal of Machine Learning Research (JMLR),
6(Sep):1453-1484, 2005.
Presenter: Vikas

[TasGueKol04]
Authors: Ben Taskar, Carlos Guestrin and Daphne Koller
Title: Max-Margin Markov Networks.
Proceedings: In Advances in Neural Information Processing Systems 16 (NIPS 2003), 2004.
Presenter: Irina

[McAllester06]
Authors: David McAllester
Generalization Bounds and Consistency for Structured Labeling
to appear in Predicting Structured Data,
edited by G. BakIr, T. Hofmann, B. Scholkopf, A. Smola, B. Taskar, and S. V. N. Vishwanathan. 2006
MIT Press.
Presenter: David


Boosting Approaches

[AltHofJoh03]
Authors: Yasemin Altun, Thomas Hofmann & Mark Johnson
Title: Discriminative Learning for Label Sequences via Boosting
Proceedings: Advances in Neural Information Processing Systems (NIPS*15), 2003.
Presenter: Ozgur

[TorMurFre05]
Authors: Antonio Torralba, Kevin Murphy and William Freeman
Title: Contextual Models for Object Detection using Boosted Random Fields
Proceedings: Advances in Neural Information Processing Systems (NIPS*17), 2005.
Presenter: Allie

[Collins04]
Authors: Michael Collins
Title: Discriminative Reranking for Natural Language Parsing.
Proceedings: International Conference on Machine Learning (ICML-2000), 2000.
Presenter: Irina


Decompositional Approaches

[DauMar05]
Authors: Hal Daume and Daniel Marcu
Title: Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction
Proceedings: International Conference on Machine Learning (ICML), 2005.
Presenter: Yasemin

[RotYih05]
Authors: D. Roth and W. Yih
Title: Integer Linear Programming Inference for Conditional Random Fields.
Proceedings: International Conference on Machine Learning (ICML)  (2005) pp. 737--744

Presenter: Karthik

[LeCHua05]
Authors: LeCun and Huang
Title: Loss Functions for Discriminative Training of Energy-Based Models
Proceedings: AI-Stats, 2005
Presenter: Allie

[WesChaEliSchVap02]
Authors: J. Weston, O. Chapelle, A. Elisseeff, B. Schoelkopf and V. Vapnik
Title: Kernel Dependency Estimation
Proceedings: NIPS 2002.

Presenter: Vikas

Semi-Supervised/Unsupervised Learning

[AltMcCBel05]
Authors: Yasemin Altun, David McAllester, Misha Belkin.
Title: Maximum Margin Semi-Supervised Learning for Structured Variables
Proceedings: NIPS 2005.


[BreSch06]
Authors: Ulf Brefeld, Tobias Scheffer.
Title: Semi-Supervised Learning for Structured Output Variables,
Proceedings: ICML 2006.


[XuWilSouSch06]
Authors: Linli Xu, Dana Wilkinson, Finnegan Southey, Dale Schuurmans
Title: Discriminative Unsupervised Learning of Structured Predictors
Proceedings: ICML 2006.