






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Main points of this past exam are: Admissible Heuristic, Simultaneously Move, Formally State, Search Problem, Branching, Maximum Branching, State Space, Maximum Size, Finicky Feast, Vegetarian Options
Typology: Exams
1 / 12
This page cannot be seen from the preview
Don't miss anything!
Last Name
First Name
Login
All the work on this exam is my own. (please sign)
For staff use only
You are designing a menu for a special event. There are several choices, each represented as a variable: (A)ppetizer, (B)everage, main (C)ourse, and (D)essert. The domains of the variables are as follows:
A: (v)eggies, (e)scargot B: (w)ater, (s)oda, (m)ilk C: (f)ish, (b)eef, (p)asta D: (a)pple pie, (i)ce cream, (ch)eese Because all of your guests get the same menu, it must obey the following dietary constraints:
(i) Vegetarian options: The appetizer must be veggies or the main course must be pasta or fish (or both). (ii) Total budget: If you serve the escargot, you cannot afford any beverage other than water. (iii) Calcium requirement: You must serve at least one of milk, ice cream, or cheese.
(a) (3 points) Draw the constraint graph over the variables A, B, C, and D.
A
C D
B
(b) (2 points) Imagine we first assign A=e. Cross out eliminated values to show the domains of the variables after forward checking. A [ e ] B [ w s m ] C [ f b p ] D [ a i ch ] Answer: The values s, m, and b should be crossed off. “s” and “m” are eliminated due to being incompatible with “e” based on constraint (ii). “b” is eliminated due to constraint (i).
(c) (3 points) Again imagine we first assign A=e. Cross out eliminated values to show the domains of the variables after arc consistency has been enforced. A [ e ] B [ w s m ] C [ f b p ] D [ a i ch ] Answer: The values s, m, b, and a should be eliminated. The first three are crossed off for the reasons above, and “a” is eliminated because there is no value for (B) that is compatible with “a” (based on constraint (iii)).
(d) (1 point) Give a solution for this CSP or state that none exists. Answer: Multiple solutions exist. One is A=e, B=w, C=f, and D=i.
(e) (2 points) For general CSPs, will enforcing arc consistency after an assignment always prune at least as many domain values as forward checking? Briefly explain why or why not. Answer: Two answers are possible: Yes. The first step of arc consistency is equivalent to forward checking, so arc consistency removes all values that forward checking does. No. While forward checking is a subset of arc consistency, after any assignment, arc consistency may have already eliminated values in a previous step that are eliminated in that step by forward checking. Thus, enforcing arc consistency will never leave more domain values than enforcing forward checking, but on a given
step, forward checking might prune values than arc consistency by pruning values that have already been pruned by arc consistency.
(c) (3 points) In each node, write UA(s), the utility of that state for player A, assuming that B is a balancer. Answer: Displayed above.
(d) (3 points) Write pseudocode for the functions which compute the UA(s) values of game states in the general case of multi-turn games where B is a balancer. Assume you have access to the following functions: successors(s) gives the possible next states, isTerminal(s) checks whether a state is a terminal state, and terminalValue(s) returns A’s utility for a terminal state. Careful: As in minimax, be sure that both functions compute and return player A’s utilities for states – B’s utility can always be computed from A’s utility.
Answer: Below. Note that for balanceValue(s), we must return the utility the maximizer’s perspective.
(h) (2 points) Consider pruning children of a B node in this scenario. On the tree on the bottom of the previous page, cross off any nodes which can be pruned, again assuming left-to-right ordering.
Answer: Answers above.
(i) (2 points) Again consider pruning children of a B node s. Let α be the best option for an A node higher in the tree, just as in alpha-beta pruning, and let v be the UA value of the best action B has found so far from s. Give a general condition under which balanceValue(s) can return without examining any more of its children.
Answer: |v| < α.
(a) (2 points) If for all i, ri = 1, pi = 1, and there is a discount γ = 0.5, what is the value V stay^ (1) of being in city 1 under the policy that always chooses stay? Your answer should be a real number. Answer: for all cities (states) i = 1,... , N , we have that the optimal value behaves as follows:
V stay^ (i) = ri + γV stay^ (i)
(remember, this is like the Bellman equation for a fixed policy). Plugging in values, we get V stay^ (i) = 1 +
V ∗(i) = max{ri + γV ∗(i) ︸ ︷︷ ︸ stay
, piγV ∗(i − 1) + (1 − pi)γV ∗(i) ︸ ︷︷ ︸ left
, piγV ∗(i + 1) + (1 − pi)γV ∗(i) ︸ ︷︷ ︸ right
Since pi = 1, this drastically simplifies:
V ∗(i) = max{ri + γV ∗(i) ︸ ︷︷ ︸ stay
, γV ∗(i − 1) ︸ ︷︷ ︸ left
, γV ∗(i + 1) ︸ ︷︷ ︸ right
From this, we see that V ∗(i) is the same for all i, so the max is obtained always with the stay action. (c) (2 points) If the ri’s and pi’s are known positive numbers and there is almost no discount, i.e. γ ≈ 1, describe the optimal policy. You may define it formally or in words, e.g. “always go east,” but your answer (^1) For i = 1, omit the left action; for i = N , omit the right action.
After (1, S, 4 , 1), we update Q(1, S) ← 0 .5[4 + 1 · 2] + 0.5(2) = 4.
Circle true or false; skipping here is worth 1 point per question. (g) (2 points) (True/False) Q-learning will only learn the optimal q-values if actions are eventually selected according to the optimal policy. Answer: False. As long as the policy used explores all the states (even a random policy will work), Q-learning will find the optimal q-values. (h) (2 points) (True/False) In a deterministic MDP (i.e. one in which each state / action leads to a single deterministic next state), the Q-learning update with a learning rate of α = 1 will correctly learn the optimal q-values. Answer: True. Remember that the learning rate is only there because we are trying to approximate a summation with a single sample. In a deterministic MDP where s′^ is the single state that always follows when we take action a in state s, we have Q(s, a) = R(s, a, s′) + maxa′ Q(s′, a′), which is exactly the update we make.
You are playing a simplified game of Wheel of Fortune. The objective is to correctly guess a three letter word. Let X, Y, and Z represent the first, second, and third letters of the word, respectively. There are only 8 possible words: X can take on the values ‘c’ or ‘l’, Y can be ‘a’ or ‘o’, and Z can be ‘b’ or ‘t’.
Before you guess the word, two of the three letters will be revealed to you. In the first round of the game, you choose one of X, Y or Z to be revealed. In the second round, you choose one of the remaining two letters to be revealed. In the third round, you guess the word. If you guess correctly, you win. The utility of winning is 1, while the utility of losing is 0.
You watch the game a lot and determine that the eight possible words occur with the probabilities shown on the right. Your goal is to act in such a way as to maximize your chances of winning (and thereby your expected utility).
(a) (3 points) What is the distribution P(Y, Z)? Your answer should be in the form of a table. Answer:
P(X=c,Y=a)=0. P(X=c,Y=o)=0. P(X=l,Y=a)=0. P(X=l,Y=o)=0. (b) (2 points) Are the second and third letters (Y and Z) independent? Show a specific computation that supports your claim. Answer: No, since P(X=c) = 0.6, P(Y=a) = 0.4 but P(X=c,Y=a)=0.2 which is not P(X=c)P(Y=a)=0. (other counterexamples exist too) (c) (2 points) Are the second and third letters (Y and Z) independent if you know the value of the first letter (X)? Show a specific computation that supports your claim. Answer: Yes. P (Y = a, Z = b|X = c) = P (X = c, Y = a, Z = b)/P (X = c) = 1/ 6. P (Y = a|X = c) = (0.1 + 0.1)/ 0. 6 P (Z = b|X = c) = (0.1 + 0.2)/(0.6) = 1/ 2. Thus, P (Y = a, Z = b|X = c) = 1/6 = P (Y = a|X = c)P (Z = b|X = c). To be certain, you have to also check for all pairs (not required for full credit). Alternatively, you can show that P (Y |X, Z) = P (Y |X)