Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Bayesian Inference - Artificial Intelligence - Lecture Slides, Slides of Artificial Intelligence

Some concept of Artificial Intelligence are Agents and Problem Solving, Autonomy, Programs, Classical and Modern Planning, First-Order Logic, Resolution Theorem Proving, Search Strategies, Structure Learning. Main points of this lecture are: Bayesian Inference, Resolution Preliminaries, Generating Maximum, Likelihood Hypotheses, Bayesian Learning, Bayes’S Theorem, Definition, Ramifications, Probabilistic Queries, Hypotheses

Typology: Slides

2012/2013

Uploaded on 04/29/2013

shantii
shantii 🇮🇳

4.4

(14)

98 documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Bayesian Inference:
MAP and Max Likelihood
Lecture 29 of 41
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download Bayesian Inference - Artificial Intelligence - Lecture Slides and more Slides Artificial Intelligence in PDF only on Docsity!

Bayesian Inference:

MAP and Max Likelihood

Lecture 29 of 41

Lecture Outline

  • Overview of Bayesian Learning
    • Framework: using probabilistic criteria to generate hypotheses of all kinds
    • Probability: foundations
  • Bayes’s Theorem
    • Definition of conditional (posterior) probability
    • Ramifications of Bayes’s Theorem
      • Answering probabilistic queries
      • MAP hypotheses
  • Generating Maximum A Posteriori (MAP) Hypotheses
  • Generating Maximum Likelihood Hypotheses
  • Next Week: Sections 6.6-6.13, Mitchell; Roth; Pearl and Verma
    • More Bayesian learning: MDL, BOC, Gibbs, Simple (Naïve) Bayes
    • Learning over text

Markov Blanket

Constructing Bayesian Networks:

The Chain Rule of Inference

Automated Reasoning using Probabilistic Models:

Inference Tasks

  • Fusion
    • Methods for combining multiple beliefs
    • Theory more precise than for fuzzy, ANN inference
    • Data and sensor fusion
      • Resolving conflict (vote-taking, winner-take-all, mixture estimation)
      • Paraconsistent reasoning
  • Propagation
    • Modeling process of evidential reasoning by updating beliefs
    • Source of parallelism
    • Natural object-oriented (message-passing) model
      • Communication: asynchronous – dynamic workpool management problem
      • Concurrency: known Petri net dualities
  • Structuring
    • Learning graphical dependencies from scores, constraints
    • Two parameter estimation problems: structure learning, belief revision

Fusion, Propagation, and Structuring

Two Roles for Bayesian Methods

  • Practical Learning Algorithms
    • Naïve Bayes ( aka simple Bayes)
    • Bayesian belief network (BBN) structure learning and parameter estimation
    • Combining prior knowledge (prior probabilities) with observed data
      • A way to incorporate background knowledge (BK), aka domain knowledge
      • Requires prior probabilities (e.g., annotated rules)
  • Useful Conceptual Framework
    • Provides “gold standard” for evaluating other learning algorithms
      • Bayes Optimal Classifier (BOC)
      • Stochastic Bayesian learning: Markov chain Monte Carlo (MCMC)
    • Additional insight into Occam’s Razor (MDL)

Choosing Hypotheses

arg maxf  x

x Ω

  • Bayes’s Theorem
  • MAP Hypothesis
    • Generally want most probable hypothesis given the training data
    • Define:the value of x in the sample spacewith the highest f ( x )
    • Maximum a posteriori hypothesis, hMAP
  • ML Hypothesis
    • Assume that p ( hi ) = p ( hj ) for all pairs i , j (uniform priors, i.e., PH ~ Uniform)
    • Can further simplify and choose the maximum likelihood hypothesis, hML

argmaxPD |hP  h

P D

PD|hP h argmax

h argmaxP h|D

h H

hH

hH MAP

PD 

P h D

P D

P D|hP h P h|D

 i

h H

hML argmaxPD|h i 

Basic Formulas for Probabilities

  • Product Rule (Alternative Statement of Bayes’s Theorem)
    • Proof: requires axiomatic set theory, as does Bayes’s Theorem
  • Sum Rule
    • Sketch of proof (immediate from axiomatic set theory)
      • Draw a Venn diagram of two sets denoting events A and B
      • Let A  B denote the event corresponding to A  B
  • Theorem of Total Probability
    • Suppose events A 1 , A 2 , …, An are mutually exclusive and exhaustive
      • Mutually exclusive: i  j  Ai  Aj =
      • Exhaustive:  P ( Ai ) = 1
    • Then
    • Proof: follows from product rule and 3 rd Kolmogorov axiom

PB 

P A B

P A|B

P  AB P A PB  P AB

     i

n

i

P B  PB|Ai P A

1

A B

Bayesian Learning Example:

Unbiased Coin [1]

  • Coin Flip
    • Sample space:= { Head , Tail }
    • Scenario: given coin is either fair or has a 60% bias in favor of Head
      • h 1  fair coin: P ( Head ) = 0.
      • h 2  60% bias towards Head : P ( Head ) = 0.
    • Objective: to decide between default (null) and alternative hypotheses
  • A Priori ( aka Prior) Distribution on H
    • P ( h 1 ) = 0.75, P ( h 2 ) = 0.
    • Reflects learning agent’s prior beliefs regarding H
    • Learning is revision of agent’s beliefs
  • Collection of Evidence
    • First piece of evidence: d  a single coin toss, comes up Head
    • Q: What does the agent believe now?
    • A: Compute P ( d ) = P ( d | h 1 ) P ( h 1 ) + P ( d | h 2 ) P ( h 2 )
  • Start with Uniform Priors
    • Equal probabilities assigned to each hypothesis
    • Maximum uncertainty (entropy), minimum prior information
  • Evidential Inference
    • Introduce data (evidence) D 1 : belief revision occurs
      • Learning agent revises conditional probability of inconsistent hypotheses to 0
      • Posterior probabilities for remaining h  VSH,D revised upward
    • Add more data (evidence) D 2 : further belief revision

Evolution of Posterior Probabilities

P ( h )

Hypotheses

P ( h | D 1 )

Hypotheses

P ( h | D 1 , D 2 )

Hypotheses

  • Problem Definition
    • Target function: any real-valued function f
    • Training examples < xi , yi > where yi is noisy training value
      • yi = f ( xi ) + ei
      • ei is random variable (noise) i.i.d. ~ Normal (0,), aka Gaussian noise
    • Objective: approximate f as closely as possible
  • Solution
    • Maximum likelihood hypothesis hML
    • Minimizes sum of squared errors (SSE)

Maximum Likelihood:

Learning A Real-Valued Function [1]

y

x

f ( x )

hML e

2

1

m

i

i i hH

hML argmin d h x

Terminology

  • Introduction to Bayesian Learning
    • Probability foundations
      • Definitions: subjectivist, frequentist, logicist
      • (3) Kolmogorov axioms
  • Bayes’s Theorem
    • Prior probability of an event
    • Joint probability of an event
    • Conditional (posterior) probability of an event
  • Maximum A Posteriori (MAP) and Maximum Likelihood (ML) Hypotheses
    • MAP hypothesis: highest conditional probability given observations (data)
    • ML: highest likelihood of generating the observed data
    • ML estimation (MLE): estimating parameters to find ML hypothesis
  • Bayesian Inference: Computing Conditional Probabilities (CPs) in A Model
  • Bayesian Learning: Searching Model (Hypothesis) Space using CPs

Summary Points

  • Introduction to Bayesian Learning
    • Framework: using probabilistic criteria to search H
    • Probability foundations
      • Definitions: subjectivist, objectivist; Bayesian, frequentist, logicist
      • Kolmogorov axioms
  • Bayes’s Theorem
    • Definition of conditional (posterior) probability
    • Product rule
  • Maximum A Posteriori (MAP) and Maximum Likelihood (ML) Hypotheses
    • Bayes’s Rule and MAP
    • Uniform priors: allow use of MLE to generate MAP hypotheses
    • Relation to version spaces, candidate elimination
  • Next Week: 6.6-6.10, Mitchell; Chapter 14-15, Russell and Norvig; Roth
    • More Bayesian learning: MDL, BOC, Gibbs, Simple (Naïve) Bayes
    • Learning over text