Deep Learning Quiz

Suppose batch gradient descent in a deep network is taking excessively long to find a value of the parameters that achieves a small value for the cost function J(W[1],b[1],...,W[L],b[L]). Which of the following techniques could help find parameter values that attain a small value forJ? (Check all that apply)

1

מיין לפי

If searching among a large number of hyperparameters, you should try values in a grid rather than random values, so that you can carry out the search more systematically and not rely on chance. True or False?

1

Note: Try random values, don't do grid search. Because you don't know which hyperparamerters are more important than others.

מיין לפי

Every hyperparameter, if set poorly, can have a huge negative impact on training, and so all hyperparameters are about equally important to tune well. True or False?

1

מיין לפי

During hyperparameter search, whether you try to babysit one model (“Panda” strategy) or train a lot of models in parallel (“Caviar”) is largely determined by:

1

מיין לפי

Which of these statements about deep learning programming frameworks are true? (Check all that apply)

1

מיין לפי

In machine translation, if we carry out beam search without using sentence normalization, the algorithm will tend to output overly short translations

1

מיין לפי

Suppose you learn a word embedding for a vocabulary of 10000 words. Then the embedding vectors should be 10000 dimensional, so as to capture the full range of variation and meaning in those words

1

מיין לפי

What is t-SNE?

1

מיין לפי

To which of these tasks would you apply a many-to-one RNN architecture?

1

מיין לפי

Discuss, Learn and be Happy דיון בשאלות

If searching among a large number of hyperparameters, you should try values in a grid rather than random values, so that you can carry out the search more systematically and not rely on chance. True or False?

Every hyperparameter, if set poorly, can have a huge negative impact on training, and so all hyperparameters are about equally important to tune well. True or False?

During hyperparameter search, whether you try to babysit one model (“Panda” strategy) or train a lot of models in parallel (“Caviar”) is largely determined by:

Which of these statements about deep learning programming frameworks are true? (Check all that apply)

In machine translation, if we carry out beam search without using sentence normalization, the algorithm will tend to output overly short translations

Suppose you learn a word embedding for a vocabulary of 10000 words. Then the embedding vectors should be 10000 dimensional, so as to capture the full range of variation and meaning in those words

What is t-SNE?

To which of these tasks would you apply a many-to-one RNN architecture?