• Home   /  
  • Archive by category "1"

Cs 229 Homework Hotline

CS229 Problem Set #1 Solutions 2 The − λ 2 θ T θ here is what is known as a regularization parameter, which will be discussed in a future lecture, but which we include here because it is needed for Newton’s method to perform well on this task. For the entirety of this problem you can use the value λ = 0 . 0001. Using this de±nition, the gradient of ℓ ( θ ) is given by ∇ θ ℓ ( θ ) = X T z − λθ where z ∈ R m is de±ned by z i = w ( i ) ( y ( i ) − h θ ( x ( i ) )) and the Hessian is given by H = X T DX − λI where D ∈ R m × m is a diagonal matrix with D ii = − w ( i ) h θ ( x ( i ) )(1 − h θ ( x ( i ) )) For the sake of this problem you can just use the above formulas, but you should try to derive these results for yourself as well. Given a query point x , we choose compute the weights w ( i ) = exp p − || x − x ( i ) || 2 2 τ 2 P . Much like the locally weighted linear regression that was discussed in class, this weighting scheme gives more when the “nearby” points when predicting the class of a new example. (a) Implement the Newton-Raphson algorithm for optimizing ℓ ( θ ) for a new query point x , and use this to predict the class of x . The q2/ directory contains data and code for this problem. You should implement the y = lwlr(X train, y train, x, tau) function in the lwlr.m ±le. This func-tion takes as input the training set (the X train and y train matrices, in the form described in the class notes), a new query point x and the weight bandwitdh tau .

Unformatted text preview: CS229 Problem Set #4 Solutions 1 CS 229, Public Course Problem Set #4 Solutions: Unsupervised Learn- ing and Reinforcement Learning 1. EM for supervised learning In class we applied EM to the unsupervised learning setting. In particular, we represented p ( x ) by marginalizing over a latent random variable p ( x ) = summationdisplay z p ( x,z ) = summationdisplay z p ( x | z ) p ( z ) . However, EM can also be applied to the supervised learning setting, and in this problem we discuss a “mixture of linear regressors” model; this is an instance of what is often call the Hierarchical Mixture of Experts model. We want to represent p ( y | x ), x ∈ R n and y ∈ R , and we do so by again introducing a discrete latent random variable p ( y | x ) = summationdisplay z p ( y,z | x ) = summationdisplay z p ( y | x,z ) p ( z | x ) . For simplicity we’ll assume that z is binary valued, that p ( y | x,z ) is a Gaussian density, and that p ( z | x ) is given by a logistic regression model. More formally p ( z | x ; φ ) = g ( φ T x ) z (1 − g ( φ T x )) 1 − z p ( y | x,z = i ; θ i ) = 1 √ 2 πσ exp parenleftbigg − ( y − θ T i x ) 2 2 σ 2 parenrightbigg i = 1 , 2 where σ is a known parameter and φ,θ ,θ 1 ∈ R n are parameters of the model (here we use the subscript on θ to denote two different parameter vectors, not to index a particular entry in these vectors). Intuitively, the process behind model can be thought of as follows. Given a data point x , we first determine whether the data point belongs to one of two hidden classes z = 0 or z = 1, using a logistic regression model. We then determine y as a linear function of x (different linear functions for different values of z ) plus Gaussian noise, as in the standard linear regression model. For example, the following data set could be well-represented by the model, but not by standard linear regression. CS229 Problem Set #4 Solutions 2 (a) Suppose x , y , and z are all observed, so that we obtain a training set { ( x (1) ,y (1) ,z (1) ) ,..., ( x ( m ) ,y ( m ) ,z ( m ) ) } . Write the log-likelihood of the parameters, and derive the maximum likelihood estimates for φ , θ , and θ 1 . Note that because p ( z | x ) is a logistic regression model, there will not exist a closed form estimate of φ . In this case, derive the gradient and the Hessian of the likelihood with respect to φ ; in practice, these quantities can be used to numerically compute the ML esimtate. Answer: The log-likelihood is given by ℓ ( φ,θ ,θ 1) = log m productdisplay i =1 p ( y ( i ) | x ( i ) ,z ( i ) ; θ ,θ 1 ) p ( z ( i ) | x ( i ) ; φ ) = summationdisplay i : z ( i ) =0 log parenleftbigg (1 − g ( φ T x )) 1 √ 2 πσ exp parenleftbigg − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 parenrightbiggparenrightbigg + summationdisplay i : z ( i ) =1 log parenleftbigg ( g ( φ T x ) 1 √ 2 πσ exp parenleftbigg − ( y ( i ) − θ T 1 x ( i ) ) 2 2 σ 2 parenrightbiggparenrightbigg Differentiating with respect to...
View Full Document

One thought on “Cs 229 Homework Hotline

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *