Discuss, Learn and be Happy דיון בשאלות

help brightness_4 brightness_7 format_textdirection_r_to_l format_textdirection_l_to_r

Value iteration is guaranteed to converge if the discount factor satisfies 0 < γ < 1

1
by
מיין לפי

For Q-learning to converge, which of the following options need to take place?

1
sentiment_very_satisfied
by
מיין לפי

a. Curriculum learning is only required when attempting to train the model to perform well on long trajectories

1
by
מיין לפי

13. You’re a new data scientist hired by Netflix. Your first assignment is to improve the following app: a user is presented with a screen containing a film poster. The user can then choose to watch the film (a reward of +1) or choose “next screen” (a reward of -1) which presents an additional film. For each film, you have a fixed-length vector that represents it. you also have a fixed number of poster types for each movie (same number for all films). Your algorithms need to select both the film and the poster to show at each time step. To train your model, you are provided with a previously collected dataset of 100,000 user sessions. Which of the following DRL algorithms should you use:

1
done
by
מיין לפי

16. Which of the following statements is correct with regard to experience replay (multiple answers may apply):

1
done
done
by
מיין לפי

In contextual bandits, the reward produced by each arm is dependent on the context

1
by
מיין לפי

a. Both the REINFORCE with a baseline and Double-DQN algorithms are similar in the sense that both use unbiased estimators

1
by
מיין לפי

1. [Imitation Learning] Which of the following statements is correct regarding DAgger with coaching (multiple answers may apply):

1
done
by
מיין לפי

4. [model-based learning] Which of the following statements is not true regarding local dynamics models

1
done
by
מיין לפי

b) One of the main challenges in meta-learning is determining which past experiences/datasets are most relevant to the current state

1
by
מיין לפי