Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Logic and Probability in Artificial Intelligence: Problem Solving and Learning, Exams of Artificial Intelligence

Problems and solutions related to logic and probability in artificial intelligence. Topics include satisfiability, conditional probability, and decision trees. It also covers natural language ambiguities and context-free grammars.

Typology: Exams

2012/2013

Uploaded on 04/02/2013

shalin_p01ic
shalin_p01ic 🇮🇳

4

(7)

86 documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS188 Intro. to AI
Fall, 2000 R. Wilensky
Final Examination
This is an open-book, open-notes exam.
Write your name, etc., in the space below; answer all questions in the space provided. (Space is
provided for your name on the top of each page as well.)
You have 3 hours to work on the exam.
There are 125 points total.
Questions vary in difficulty; do what you know first.
Good luck!
NAME:
SID: TA:
(Space below for official use only.)
Problem 1: (20)
Problem 2: (25)
Problem 3: (20)
Problem 4: (30)
Problem 5: (20)
Problem 6: (10)
Total: (125)
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download Logic and Probability in Artificial Intelligence: Problem Solving and Learning and more Exams Artificial Intelligence in PDF only on Docsity!

CS188 Intro. to AI

Fall, 2000 R. Wilensky

Final Examination

  • This is an open-book, open-notes exam.
  • Write your name, etc., in the space below; answer all questions in the space provided. (Space is provided for your name on the top of each page as well.)
  • You have 3 hours to work on the exam.
  • There are 125 points total.
  • Questions vary in difficulty; do what you know first.
  • Good luck!

NAME:

SID: TA:

( Space below for official use only. )

Problem 1: (20) Problem 2: (25) Problem 3: (20) Problem 4: (30) Problem 5: (20) Problem 6: (10)

Total: (125)

Problem 1

(20 points, 2 points each) For each statement below, say if it is true or false, and give a one sentence explanation of your answer.

(a) The sentence “∀ x,y Parent ( x , y ) → Child ( y , x )” is satisfiable but not logically valid.

True: True if Parent and Child are mapped to inverse relations, which of course, they may not be.

(b) Any linearly separable data set can be learned by some single layer perceptron.

True: We proved that perceptrons learn exactly the linearly separable sets.

(c) Decision tree learning algorithms may be subject to overtraining, but not neural network learning algorithms.

False: We meant overfitting , which can (and does) occur in NNs as well. (The typo didn’t seem to bother many of you, and overtraining is not a bad term for what happens anyway.)

(d) Given the expression (which we corrected during the exam, to include r as an existential variable to and make the! a 1)

g,a,r,p,d Ind ( g,Giving ) ∧ Agent ( a,g ) ∧ Recipient ( r,g ) ∧ Donor ( d,g ) ∧ Theme ( p,g )

we can derive the following expression, assuming that all the constants do not otherwise appear in any other formula:

Ind(G1,Giving)Agent(A1,G1)Recipient (R1,G1)Donor (D1,G1 )Theme ( P1,G1 )

True: Via Existential instantiation (Skolemization)

(e) Temporal difference learning can be used for deterministic MDPs, but not for non-deterministic tasks.

True. With the joint, we know everything— we can sum up entries to find the individual distributions, etc.

Problem 2 (25 points )

Suppose the following are (two-valued) random variables: BootsUp , OS-Ok , HardwareOk , SpilledCoffee. These express the probability that a certain computer will boot (i.e., start up correctly), that its operating system has not been corrupted, that the hardware is functional, and that coffee was spilled on one of the circuit boards, respectively.

(a) (3) Draw a suitable belief network for these variables.

BootsUp has OS-Ok and HardwareOk as parents; HardwareOk has SpilledCoffee as a parent.

(b) (3) Write down the formula that your network expresses for the joint probability distribution of these variables.

P(BootsUp, OS-Ok, HardwareOk, SpilledCoffee) = P(OScorrupted) × P(SpilledCoffee) × P(HardwareOk|SpilledCoffee) × P(BootsUp|HardwareOk,OScorrupted)

(c) (3) Provide plausible conditional probability tables for the nodes HardwareOk and BootsUp in your network.

SpilledCoffee P(HardwareOk|SpilledCoffee) T. F.

HardwareOk OS-Ok P(BootsUp |HardwareOk,OS-Ok) T T. T F. F T. F F.

(d) (4)

(i) How many independent probability values are needed to fill out all the tables required by your network? (Explain your answer in a sentence.)

There are 8 values needed for the probability tables (the other 8 being computable from subtracting from 1).

(ii) How many independent probability values would be needed to express the joint if we couldn't make any of the independence assumptions implied by your network structure? (Explain your answer in a sentence.)

We'd have to know 15 values, the last being computable by subtracting the sum from 1.

(e) (2) The conditional probability table for BootsUp is a canonical one. Describe, in English, its general structure.

It is a noise-AND. (see above)

(f) (3) Is spilling coffee conditionally independent of whether the OS is corrupted, given that we know whether or not the machine could boot? Can observing the value of another variable change this relationship?

No, but it will be if we observe the remaining node, HardwareOk.

(g) (7) A lie detector test is known to be 99% reliable when the person is guilty but only 90% reliable when the person is innocent. If a suspect is chosen from a group of suspects of whom only 1% have committed a crime, and the test indicates that the suspect is guilty, what is the probability that the suspect is innocent?

P(Innocent | Test=Guilty) = P(Innocent)P(Test=Guilty | Innocent)/P(Test=Guilty)

P(Innocent) =. P(Guilty) =. P(Test=Guilty | Innocent) = 1-.90=. P(Test=Guilty | Guilty) =.

vii) A catalog of legal junction types can be used to constrain possible line labels, but the catalog will get larger as we enlarge the class of scenes we wish to interpret.

True. As we include cracks, shadows, etc., the catalog can get large.

viii) Generalized cylinders are proposed to describe the shape of objects because it is not possible to approximate some object shapes using polyhedra.

False. It is always possible, just sometimes costly.

ix) The shading of an object provides useful information about that object's shape.

True. This is “shape from shading”.

x) Color does not appear to play a significant role in computer vision because it provides no useful information for segmentation.

False. It is quite useful, as we saw in Blobworld, just hard to do right.

Problem 4 (30 points)

(a) (5) Let’s define the “majority”function of n binary inputs x 1 ,, xn , each either -1 or 1, to be 1 if more than half of the inputs are 1, and -1 otherwise. Can this function be represented by a single layer perceptron? Either show that this is impossible or construct such a perceptron (i.e., for a given n , specify what the weights for such a perceptron could be).

Sure, set all the weights to 1/n and the threshold to ½, say.

(b) (5) The function EQUAL of two inputs, x 1 and x 2 , is defined to be 1 if the inputs are the same (both -1 or both 1) and -1 otherwise. Can this function be represented by a single layer perceptron? Either show that this is impossible or construct such a perceptron.

This is just ``not XOR''; XOR isn't linearly separable.

(c) (5) Draw a decision tree for the majority function, assuming three inputs.

(d) (5) We observed (although we didn't prove) that any function can be approximated by a multilayer feedfordward neural network of a few layers in which the hidden units have sigmoid activation functions. Suppose instead we used linear units, i.e., units whose activation function was just a constant multiple of its total input. Doing so would severely restrict the expressive power of the networks that such networks could implement. Explain why.

If all the nodes were linear, the whole network could only compute a linear combination of its inputs, which is a quite restricted set of functions.

Below is a network of units. The units are represented by large and small ovals, all functionally identical. The large ovals are intended to represent people in different roles, and action types, and the small ones, actions. The light lines with dots at each end represent inhibitory connections between units, and the dark lines excitatory connections; all connections are bidirectional.

For example, the oval labeled “A-Jan”represents Jan in the role of an agent, and the one labelled “A-Lynn” Lynn being an agent. These are shown as inhibiting one another. However, the node labeled “A-Lynn” is connected by an excitatory link to a small circle labeled “I1”, which is itself connected to an oval labeled

Input 1 =1?

Y N

Input 2 =1? (^) Input 2 =1?

Y

Y (^) Input 3 =1?

N

Y

Y

N

N

Y

Input 3 =1?

Y

Y

N

N

N

N

“crosstalk”. In general, this is a real problem in NNs: It is very hard to create the equivalent of variable bindings in NNs.

(g) (5) In reinforcement learning, algorithms like Q Learning tell us how to successively approximate some function, which, if we knew it, would in turn give us an optimal policy. We have to explore our world in order to estimate this function, and to do so, we have to move around via some policy. Indeed, we may already have learned quite a decent policy, and it would seem reasonable to use the best policy we know of to wander round, as we try to learn a better policy.

However, rather than use the best policy estimate directly, many reinforcement learning algorithms instead do the following: Most of the time, follow the best estimate policy; some small percentage of the time, pick one of the possible actions at random.

Picking an action at random is likely to give an inferior result for that particular action, especially when our policy is already pretty good. Why is doing so nevertheless a good idea?

Many reinforcement learning algorithms, e.g., Q Learning, require that we visit each state infinitely in order to be sure they converge on the optimal policy. Picking an action at random, even a small percentage of the time, assures this is the case.

Problem 5 (20 points)

(a) (5) Each of the following examples illustrates a kind of natural language ambiguity. In each case, name a type of ambiguity involved, explaining the particular instance of ambiguity in the sentence that must be resolved. Be as specific as possible; e.g., state whether a lexical ambiguity is syntactic or semantic (or both).

i. “Squad helps dog bite victim”(actual newspaper headline)

Syntactic ambiguity: [np squad ] [v helps] [np dog bite victim] or [np squad ] [v helps] [np dog] [vp [v bite] [n victim]]]

(also, bite is lexically syntactically ambiguous (n in first interpretaion, v in second)

ii. “I had a ball.”

semantic lexical ambiguity of “ball”

iii. “John told the waiter he didn’t have any money.”

pronoun reference (“he” might be the waiter or John)

iv. “The judgment of the court has been questioned.”

role ambiguity: The court can be the agent of the judgment, or its theme (if, e.g., the public no longer trusts the court).

v. “Jan doesn’t want to marry someone”.

scope of quantifier ambiguity: It’s not the case that there exists someone that Jan wants to marry, versus, there exists some whom Jan doesn’t want to marry. ¬ ∃p Want( Marry(Jan, p)) ∃p ¬ Want( Marry(Jan, p))

(b) (5) Consider the following context-free grammar:

S → NP VP VP → V NP NP → D NP NP → A NP NP → N NP → Jan D → the, a A → little, young, good N → boy, girl, boys, girls V → likes, like

Indicate (by circling Y or N) which of the following sentences are admitted by this grammar:

i. Jan likes a little boy. Y ii. Likes a young boy a little girl. N iii. Boys like girls. Y iv. Little a young boy likes the a girls. Y v. A Jan likes a boy. Y

(c) (5) Show a parse tree for the sentence “Jan likes good little girls.”

S

NP

Jan

VP

V NP

likes A

good

NP

A

little

NP

(b) (5) Suppose it were true that At(Loc1,S 1 ). We would like to use resolution to prove from this fact and the successor-state axiom above that we can move to Loc2. What difficulty would we have in using resolution given the form of this axiom?

Fortunately, we don't need all the information in this axiom to prove what we want, but just that part that says “If we are at a location in a state, then moving to a location results in a state in which we are at that location”. Write down this part of the above axiom in clausal form.

Equality makes resolution inapplicable. The necessary part is just

∀ x,y,s At(x,s) → (At(y, Do(Go(x,y),s))

which, in clausal form is

{¬ At(x,s), At(y,Do(Go(x,y),s))}