Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Forward Checking - Introduction to Artificial Intelligence - Solved Exams, Exams of Artificial Intelligence

Main points of this past exam are: Forward Checking, Crawler’S Escape, Crawler Snuck, Crawler’S Body, Negative Distance, Moves Backwards, Successor Function, Shoulder Position, Graph Search, Search Problem

Typology: Exams

2012/2013

Uploaded on 04/02/2013

shalin_p01ic
shalin_p01ic 🇮🇳

4

(7)

86 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS 188 Introduction to
Spring 2009 Artificial Intelligence Midterm Solutions
1. (16 points) True/False
For the following questions, a correct answer is worth 2 points, no answer is worth 1 point, and an incorrect
answer is worth 0 points. Circle true or false to indicate your answer.
a) (true or false) If g(s) and h(s) are two admissible Aheuristics, then their average f(s) = 1
2g(s) + 1
2h(s)
must also be admissible.
True. Let h(s) be the true distance from s. We know that g(s)h(s) and h(s)h(s), thus f(s) =
1
2g(s) + 1
2h(s)1
2h(s) + 1
2h(s) = h(s)
b) (true or false) For a search problem, the path returned by uniform cost search may change if we add a
positive constant Cto every step cost.
True. Consider that there are two paths from the start state (S) to the goal (G), SAGand SG.
cost(S, A) = 1, cost(A, G) = 1, and cost(S, G) = 3. So the optimal path is through A. Now, if we add 2
to each of the costs, the optimal path is directly from Sto G. Since uniform cost search finds the optimal
path, its path will change.
c) (true or false) The running-time of an efficient solver for tree-structured constraint satisfaction problems is
linear in the number of variables.
True. The running time of the algorithm for tree-structured CSPs is O(n·d2), where nis the number of
variables and dis the maximum size of any variable’s domain.
d) (true or false) If h1(s) is a consistent heuristic and h2(s) is an admissible heuristic, then min(h1(s), h2(s))
must be consistent.
False. For instance, if h2(s) be admissible but inconsistent, and h1(s) dominate h2(s), then min(h1(s), h2(s)) =
h1(s), which is inconsistent.
e) (true or false) The amount of memory required to run minimax with alpha-beta pruning is O(bd) for
branching factor band depth limit d.
True and False (everyone wins). The memory required is only O(bd), so we accepted False. However, by
definition an algorithm that is O(bd) is also O(bd), because Odenotes upper bounds that may or may not
be tight, so technically this statement is True (but not very useful).
f) (true or false ) In a Markov decision process with discount γ= 1, the difference in values for two adjacent
states is bounded by the reward between them: |V(s)V(s0)| maxaR(s, a, s0).
False. Let V(s0) = 0, and R(s, a, s0) = 0 a, but there is an action a0which takes sto the terminal state T
and gives a reward 100. Thus V(s)100, but the inequality above says that |V(s)0| 0.
g) (true or false) Value iteration and policy iteration must always converge to the same policy.
True and False (everyone wins). Both algorithms are guaranteed to converge to the optimal policy, so we
accepted True. If there are multiple policies that are optimal (meaning they yield the same maximal values
for every state), then the algorithms might diverge. Both value iteration and policy iteration will always
lead to the same optimal values.
h) (true or false) In a Bayes’ net, if A B, then A B|Cfor some variable Cother than Aor B.
False. Consider the Bayes’ net ACB. Clearly, A B, but A6⊥ B|C.
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Forward Checking - Introduction to Artificial Intelligence - Solved Exams and more Exams Artificial Intelligence in PDF only on Docsity!

CS 188 Introduction to

Spring 2009 Artificial Intelligence Midterm Solutions

  1. (16 points) True/False

For the following questions, a correct answer is worth 2 points, no answer is worth 1 point, and an incorrect answer is worth 0 points. Circle true or false to indicate your answer.

a) (true or false) If g(s) and h(s) are two admissible A∗^ heuristics, then their average f (s) = 12 g(s) + 12 h(s) must also be admissible. True. Let h∗(s) be the true distance from s. We know that g(s) ≤ h∗(s) and h(s) ≤ h∗(s), thus f (s) = 1 2 g(s) +^

1 2 h(s)^ ≤^

1 2 h

∗(s) + 1 2 h

∗(s) = h∗(s)

b) (true or false) For a search problem, the path returned by uniform cost search may change if we add a positive constant C to every step cost. True. Consider that there are two paths from the start state (S) to the goal (G), S → A → G and S → G. cost(S, A) = 1, cost(A, G) = 1, and cost(S, G) = 3. So the optimal path is through A. Now, if we add 2 to each of the costs, the optimal path is directly from S to G. Since uniform cost search finds the optimal path, its path will change. c) (true or false) The running-time of an efficient solver for tree-structured constraint satisfaction problems is linear in the number of variables. True. The running time of the algorithm for tree-structured CSPs is O(n · d^2 ), where n is the number of variables and d is the maximum size of any variable’s domain. d) (true or false) If h 1 (s) is a consistent heuristic and h 2 (s) is an admissible heuristic, then min(h 1 (s), h 2 (s)) must be consistent. False. For instance, if h 2 (s) be admissible but inconsistent, and h 1 (s) dominate h 2 (s), then min(h 1 (s), h 2 (s)) = h 1 (s), which is inconsistent. e) (true or false) The amount of memory required to run minimax with alpha-beta pruning is O(bd) for branching factor b and depth limit d. True and False (everyone wins). The memory required is only O(bd), so we accepted False. However, by definition an algorithm that is O(bd) is also O(bd), because O denotes upper bounds that may or may not be tight, so technically this statement is True (but not very useful). f) (true or false) In a Markov decision process with discount γ = 1, the difference in values for two adjacent states is bounded by the reward between them: |V (s) − V (s′)| ≤ maxa R(s, a, s′). False. Let V (s′) = 0, and R(s, a, s′) = 0 ∀a, but there is an action a′^ which takes s to the terminal state T and gives a reward 100. Thus V (s) ≥ 100, but the inequality above says that |V (s) − 0 | ≤ 0. g) (true or false) Value iteration and policy iteration must always converge to the same policy. True and False (everyone wins). Both algorithms are guaranteed to converge to the optimal policy, so we accepted True. If there are multiple policies that are optimal (meaning they yield the same maximal values for every state), then the algorithms might diverge. Both value iteration and policy iteration will always lead to the same optimal values. h) (true or false) In a Bayes’ net, if A ⊥⊥ B, then A ⊥⊥ B | C for some variable C other than A or B. False. Consider the Bayes’ net A → C ← B. Clearly, A ⊥⊥ B, but A 6 ⊥⊥ B | C.

  1. (15 points) Search: Crawler’s Escape

Whilst Pacman was Q-learning, Crawler snuck into mediumClassic and stole all the dots. Now, it’s trying to escape as quickly as possible. At each time step, Crawler can either move its shoulder or its elbow one position up or down. Both joints s and e have five total positions each (1 through 5) and both begin in position 3. Upon changing arm positions from (s, e) to (s′, e′), Crawler’s body moves some distance r(s, e, s′, e′), where |r(s, e, s′, e′)| ≤ 2 meters (negative distance means the crawler moves backwards). Crawler must travel 10 meters to reach the exit.

C

F! G

A C G A C G

A C G A C G

C

N

r(3, 3, 3, 4)

10 meters

exit

EA!

Action: increase

elbow position

In this problem, you will design a search problem for which the optimal solution will allow Crawler to escape in as few time steps as possible.

(a) (3 pt) Define a state space, start state and goal test to represent this problem. A state is a 3-tuple consisting of distance to exit, shoulder position, and elbow position. The start state is (10, 3 , 3). Goal test: distance to exit is less than or equal to zero.

(b) (3 pt) Define the successor function for the start state by listing its (successor state, action, step cost) triples. Use the actions s+ and s− for increasing and decreasing the shoulder position and e+ and e− for the elbow.

{ ((10 − r(3, 3 , 4 , 3), 4 , 3), s+, 1), ((10 − r(3, 3 , 2 , 3), 2 , 3), s−, 1), ((10 − r(3, 3 , 3 , 4), 3 , 4), e+, 1), ((10 − r(3, 3 , 3 , 2), 3 , 2), e−, 1) }

  1. (20 points) CSPs: Constraining Graph Structure

Hired as a consultant, you built a Bayes’ net model of Cheeseboard Pizza customers. Last night, a corporate spy from Zachary’s Pizza deleted the directions from all your arcs. You still have the undirected graph of your model and a list of dependencies. You now have to recover the direction of the arrows to build a graph that allows for these dependencies. Note: X 6 ⊥⊥ Y means X is not independent of Y.

A B

C

F! G

A C G A C G

A C G A C G

X 1! X 2! Xn!

E 1! E 2! En-1!

P!

C

E!

T!

N

A

r(s 0 , e 0 , s 0 , e 0 +1)

10 meters

exit

EA!

Distribution properties (dependencies):

A 6 ⊥⊥ G

B 6 ⊥⊥ F

B 6 ⊥⊥ G | C

a) (1 pt) Given the first constraint only, circle all the topologies that are allowed for the triple (A, C, G).

A B

C

F! G

A C G A C G

A C G A C G

X 1! (^) E X 2! Xn! 1! E 2! En-1!

…!

P! C

E!

T!

N

A

D

r(3, 3, 3, 4) 10 meters

exit

EA!

Action: increase elbow position

b) (3 pt) Formulate direction-recovery for this graph as a CSP with explicit binary constraints only. The variables and values are provided for you. The variable EA stands for the direction of the arc between A and the center C, where out means the arrow points outward (toward A), and in means the arrow points inward (toward C). Variables: EA, EB , EF , EG Values: out, in Constraints: An explicit constraint lists all legal tuples of values are listed for a tuple of variables.

(EA, EG) ∈ {(in, out), (out, out), (out, in)} (EB , EF ) ∈ {(in, out), (out, out), (out, in)} (EB , EG) ∈ {(in, in)}

c) (1 pt) Draw the constraint graph for this CSP. EA − EG − EB − EF d) (2 pt) After selecting EA = in, cross out all values eliminated from EB , EF and EG by forward checking.

EA EB EF EG in out in out in out in

Forward checking removes the values of variables that directly contradict the assignment EA = in. e) (2 pt) Cross out all values eliminated by arc consistency applied before any backtracking search.

EA EB EF EG out in out in out in out in

We first remove EB = out because it is incompatible with any value for EG. Likewise, we remove EG = out. Now, we can remove EA = in based on EG and EF = in based on EB. f ) (1 pt) Solve this CSP, then add the correct directions to the arcs of the graph at the top of the page. EA = out, EB = in, EF = out, and EG = in.

NAME: 5

Now consider a chain structured graph over variables X 1 ,... , Xn. Edges E 1 ,... , En− 1 connect adjacent variables, but their directions are again unknown.

A B

C

D E!

A C E! A C E!

A C E! A C E!

X 1! X 2! Xn!

E 1! E 2! En-1!

g) (4 pt) Using only binary constraints and two-valued variables, formulate a CSP that is satisfied by only and all networks that can represent a distribution where X 1 6 ⊥⊥ Xn. Describe what your variables mean in terms of direction of the arrows in the network.

Variables:

Variables E 1 ,... , En− 1 correspond to the directions of these edges and take values {lef t, right}.

Constraints:

(Ei, Ei+1) 6 = (right, lef t), for i ∈ { 0 ,... , n − 2 }.

h) (6 pt) Using only unary, binary and ternary (3 variable) constraints and two-valued variables, formulate a CSP that is satisfied by only and all networks that enforce X 1 ⊥⊥ Xn. Describe what your variables mean in terms of direction of the arrows in the network. Hint: You will have to introduce additional variables.

Variables:

Variables E 1 ,... , En− 1 correspond to the directions of these edges and take values {lef t, right}. I 1 ,... , In− 2 can take values {T, F }. O 1 ,... , On− 2 can take values {T, F }.

Constraints:

Intuitively, we want Ii to be T if Xi → Xi+1 ← Xi+2, i.e. the triple Xi, Xi+1, Xi+2 is inactive. Also, we want Oi to be T if either Ii or Oi− 1 is T, i.e. if the current triple is inactive or if any of the prior triples was inactive. Finally, we want On− 2 to be T which would imply that there is some inactive triple.

(Ii, Ei, Ei+1) ∈ {(T, right, lef t), (F, right, right), (F, lef t, right), (F, lef t, lef t)} for i ∈ { 1 ,... , n − 2 }. (I 1 , O 1 ) ∈ {(T, T ), (F, F )} (Oi− 1 , Ii, Oi) ∈ {(F, F, F ), (F, T, T ), (T, F, T ), (T, T, T )} for i ∈ { 2 ,... , n − 2 } On− 2 = true

NAME: 7

  1. (21 points) MDPs: Robot Soccer

A soccer robot A is on a fast break toward the goal, starting in position 1. From positions 1 through 3, it can either shoot (S) or dribble the ball forward (D); from 4 it can only shoot. If it shoots, it either scores a goal (state G) or misses (state M ). If it dribbles, it either advances a square or loses the ball, ending up in M.

X

O

X X

X

O

O X

O

1 2 3 4

X

O

X X

X

O

O X

O

1 2 3 4

X

O

X X

X

O

O X

O

1 2 3 4

X

O

X X

X

O

O X

O

1 2 3 4

X

O

X X

X

O

O X

O

1 2 3 4

X

O

X X

X

O

O X

O

1 2 3 4

4

A D Goal

1 2 3

In this MDP, the states are 1, 2, 3, 4, G and M , where G and M are terminal states. The transition model depends on the parameter y, which is the probability of dribbling success. Assume a discount of γ = 1. T (k, S, G) = k 6 T (k, S, M ) = 1 − k 6 for k ∈ { 1 , 2 , 3 , 4 } T (k, D, k + 1) = y T (k, D, M ) = 1 − y for k ∈ { 1 , 2 , 3 } R(k, S, G) = 1 for k ∈ { 1 , 2 , 3 , 4 }, and rewards are 0 for all other transitions

(a) (2 pt) What is V π^ (1) for the policy π that always shoots? V π^ (1) = T (1, S, G)R(1, S, G) + T (1, S, M )R(1, S, M ) = (^16) (b) (2 pt) What is Q∗(3, D) in terms of y?

Q∗(3, D) = T (3, D, 4)(R(3, D, 4) + V ∗(4)) + T (3, D, M )R(3, D, M )

= T (3, D, 4)V ∗(4)

= T (3, D, 4)Q∗(4, S)

= T (3, D, 4)(T (4, S, G)R(4, S, G) + T (4, S, M )R(4, S, m)) = T (3, D, 4)T (4, S, G)R(4, S, G)

=

y

(c) (2 pt) Using y = 34 , complete the first two iterations of value iteration. i V (^) i∗ (1) V (^) i∗ (2) V (^) i∗ (3) V (^) i∗ (4) 0 0 0 0 0 (^1 ) (^2 ) (d) (2 pt) After how many iterations will value iteration compute the optimal values for all states? After 3 iterations, the values will have converged when y = 34. Above, only V ∗(1) has not yet converged. We note that for y > 34 , a fourth iteration would be required because a fast break has up to four transitions. (e) (2 pt) For what range of values of y is Q∗(3, S) ≥ Q∗(3, D)?

Q∗(3, S) ≥ Q∗(3, D)

T (3, S, G) · 1 ≥ T (3, D, 4) · T (4, S, G) · 1

≥ y ·

≥ y ≥ 0

The dribble success probability y in fact depend on the presence or absence of a defending robot, D. A has no way of detecting whether D is present, but does know some statistical properties of its environment. D is present 23 of the time. When D is absent, y = 34. When D is present, y = 14. (f ) (4 pt) What is the posterior probability that D is present, given that A Dribbles twice successfully from 1 to 3, then Shoots from state 3 and scores. We can use Bayes’ rule, where D is a random variable denoting the presence of D, and e is the evidence that A dribbled twice and scored.

P (d|e) =

P (e|d) · P (d) P (e)

P (e) = P (e|d) · P (d) + P (e|¬d) · P (¬d)

P (e|d) =

P (e|¬d) =

P (e) =

P (d|e) =

(g) (3 pt) What transition model should A use in order to correctly compute its maximum expected reward when it doesn’t know whether or not D is present?

To maximize expected total reward, the agent should model the situation as accurately as possible. We accepted two answers for T (k, D, k + 1). One answer is to claim that the agent should use its marginal belief that each dribble will be successful, summing over the two possibilities of D’s presence or absense. In this case, T (k, D, k + 1) = 23 · 14 + 13 · 34 = 125 , and T (k, D, M ) = 127. Another acceptable answer would be to update the beliefs of the agent after every successful dribble. Hence, while T (1, D, 2) = 125 as above, reaching state 2 implies a successful previous dribble, and so another successful dribble is more likely. In particular, let X 1 , X 2 and X 3 indicate successful first, second and third dribbles, while∑ D is the presence of the defender. Since X 1 ⊥⊥ X 2 | D, we have P (x 2 |x 1 ) = d P^ (d|x^1 )P^ (x^2 |d). Computing similarly to part (f),^ P^ (d|x^1 ) =^

2 5 , so^ T^ (2, D,^ 3) =^

2 5 ·^

1 4 +^

3 5 ·^

3 4 =^

11

We can compute T (3, D, 4) similarly. X 3 is conditionally independent of both X 1 and X 2 given D, so P (x 3 |x 1 , x 2 ) =

d P^ (d|x^1 , x^2 )P^ (x^3 |d). Using our result from (f),^ P^ (d|x^1 , x^2 ) =^

2 2 11 , and so^ T^ (3, D,^ 4) = 11 ·^

1 4 +^

9 11 ·^

3 4 =^

29

Since a dribble either succeeds or fails, T (k, D, M ) = 1 − T (k, D, k + 1) for all k. Shooting probabilities are unchanged by the presence of the defender.

T (k, S, G) = k 6 T (k, S, M ) = 1 − k 6 for k ∈ { 1 , 2 , 3 , 4 } T (k, D, k + 1) = 125 T (k, D, M ) = 127 for k ∈ { 1 , 2 , 3 }

OR

T (k, S, G) = k 6 T (k, S, M ) = 1 − k 6 for k ∈ { 1 , 2 , 3 , 4 } T (1, D, 2) = 125 T (2, D, 3) = 1120 T (3, D, 4) = (^2944) T (1, D, M ) = 127 T (2, D, M ) = 209 T (3, D, M ) = (^1544)

  1. (18 points) Bayes’ Nets: The Mind of Pacman

Pacman doesn’t just eat dots. He also enjoys pears, apples, carrots, nuts, eggs and toast. Each morning, he chooses to eat some subset of these foods. His preferences are modeled by a Bayes’ net with this structure.

A B

C

D E!

A C E!

A C E

P!

C

E!

T!

N

A

a) (2 pt) Factor the probability that Pacman chooses to eat only apples and nuts in terms of conditional probabilities from this Bayes’ net.

P (¬p, a, ¬c, n, ¬e, ¬t) = P (a) · P (¬p|a) · P (¬c|a) · P (n|¬p, ¬c) · P (¬t|¬c) · P (¬e|n, ¬t)

b) (4 pt) For each of the following properties, circle whether they are true, false or unknown of the distribution P (P, A, C, N, E, T ) for this Bayes’ net.

Without the conditional probability tables, we do not know if any two variables are dependent. From the network topology, we can only conclude true or unknown for the questions below. N ⊥⊥ T (true , false , unknown) unknown: T − C − N is an active path. P ⊥⊥ E | N (true , false , unknown) unknown: P − A − C − T − E is an active path. P ⊥⊥ N | A, C (true , false , unknown) unknown: P and N are directly connected. E ⊥⊥ A | C, N (true , false , unknown) true: All three paths from E to A are inactive.

c) (2 pt) If the arrow from T to E were reversed, list two conditional independence properties that would be true under the new graph that were not guaranteed under the original graph.

The triple C − T − E is no longer active when T is unknown, therefore P ⊥⊥ E | N and A ⊥⊥ E | N and C ⊥⊥ E | N The triple N − E − T is no longer active when E is known, therefore T ⊥⊥ N | C, E and T ⊥⊥ P | C, E and T ⊥⊥ A | C, E There are further independence assumptions based on these 6 that condition on more variables, such as A ⊥⊥ E | N, P. d) (2 pt) You discover that P (a|¬c, ¬t) > P (a|¬c). Suggest how you might change the network in response.

This inequality implies that A 6 ⊥⊥ T | C by definition, but in the network above, A and T are conditionally independent given C. So, the network must be changed to allow for this dependence. Adding an arc from A to T would fix the problem. Note: adding an arc from T to A creates a cycle, which is not allowed in a Bayes’ net. Other acceptable solutions exist, like reversing the direction of the arc from T to C.

NAME: 11

e) (4 pt) You now want to compute the distribution P (A) using variable elimination. List the factors that remain before and after eliminating the variable N.

Before: The initial factors are just the conditional probability tables of the Bayes’ net. P (A), P (P |A), P (C|A), P (T |C), P (N |P, C), P (E|T, N )

After: First, all factors that include the variable N are joined together, yielding P (E, N |P, C, T ) Next, the variable N is summed out of this new factor, yielding P (E|P, C, T ) The remaining factors include this new factor and the unused original factors. P (A), P (P |A), P (C|A), P (T |C), P (E|P, C, T ) Referring to the final factor as m(E, P, C, T ) (like in the textbook) was also accepted.

f ) (2 pt) Pacman’s new diet allows only fruit (P and A) to be eaten, but Pacman only follows the diet occasionally. Add the new variable D (for whether he follows the diet) to the network below by adding arcs. Briefly justify your answer.

P! C

E!

T!

N

A

D

P! C

E!

T!

N

A

D

The best answer is the one on the left, which allows us to express changes to all food preferences based on the diet. Other answers were accepted with appropriate justification, such as the network on the right, which specifies that the diet specifically disallows C, N, E and T. Connecting D only to P and/or A is problematic: for instance, observing P and A would make D independent of C, but the diet should still affect whether carrots (C) are allowed even when A and P are known. g) (2 pt) Given the Bayes’ net and the factors below, fill in the table for P (D|¬a) or state that there is not enough information.

From the tables P (D) and P (A|D), the entries of the factor P (D|A = ¬a) can be computed from Bayes’ rule (which always holds): no additional information about the properties of the distributions are required. The trick is to observe that while P (¬a) is not explicitly given, it can be computed from P (D) and P (A|D) using the product rule: ∀a,d : P (a, d) = P (d)P (a|d). Computing P (¬a) in this way is equivalent to applying the normalization trick.

D P (D) d 0. ¬d 0.

D A P (A|D)

d a 0. d ¬a 0. ¬d a 0. ¬d ¬a 0.

D A P (D|A = ¬a) d ¬a (^) P (¬a|¬d)P·P^ (¬ (¬a|dd)+)·PP^ ( (d¬)a|d)·P (d) = (^0). 5 · 00 ..6+0^2 ·^0.^4. 2 · 0. 4 = 194 ¬d ¬a (^) P (¬a|¬Pd)^ (·¬Pa (|¬¬dd)+)·PP^ ( (¬¬da)|d)·P (d) = (^0). 5 · 00 ..6+0^5 ·^0.^6. 2 · 0. 4 = (^1519)