CS 343 Artificial Intelligence
Homework 4: Machine Learning and Natural Language
Due: May 7, 2009
Consider a concept learning problem described using the given features:
size : big, medium, small
color : red, blue, green
shape : square, circle, triangle
For each of the following two different training sets, show the consistent hypothesis learned, if any, by the maximally specific conjunctive learning algorithm. If no consistent conjunctive hypothesis is learnable, clearly state this and explain why.
Training Set 1:
< small, red, circle > : positive
< small, red, triangle > : positive
< small, green, circle > : negative
< big, red, triangle > : negative
Training Set 2:
< big, blue, circle > : positive
< medium, blue, triangle > :positive
< small, blue, square > : positive
< small, blue, circle > : negative
< big, green, square > : negative
What is the total number of possible binary functions over this instance space?
What is the total number of semantically distinct conjunctive hypotheses over this instance space (assuming negation is not allowed)?
Using the same features as in the previous problem, show the decision tree learned by the ID3 algorithm from the training set:
< big, blue. circle > : positive
< medium, green, triangle > : positive
< big, blue, square > : positive
< small, blue, circle > : negative
< big, green, square > : negative
Explicitly show the gain of each feature at each choice point. If there is a tie in gain, prefer the feature first in the list: (size, color, shape).
Below are the relevant formulae:
Given the following context-free grammar, draw trees for all parses
produced for the sentence: ``Guard tests like gold.'' Under each tree, briefly
paraphrase in English the meaning of each parse as best you can and circle the
most likely interpretation.
S -> NP VP, S -> VP,
NP -> Adj NP, NP -> N, NP -> NP PP,
VP -> V, VP -> V NP, VP -> VP PP
PP -> Prep NP,
Prep -> like,
N -> guard, N -> gold, N -> tests,
V -> guard, V -> tests, V -> like,
Adj -> gold, Adj -> guard
Given the simple PCFG in the domain of air travel given on slide 70
of the slide packet on NLP, show all of the parse trees for the sentence "I
prefer the flight through Houston on NWA" and compute the probability of each. Also compute
the probability that the grammar generates this sentence P("I prefer the flight
through Houston on NWA"). Finally, compute the labeled precision, recall, and F1 evaluation score
for each incorrect parse tree when compared to the correct parse.