DRL

[Advanced exploration methods] Which of the following statements is not true? Answer based on the works studied in the course (multiple answers may apply):

1

מיין לפי

[MCTS] Assume you are using Monte Carlo Tree Search to develop a policy for a two-player game. Also, whenever a player takes an action, there is a 30% chance of another action being chosen in random. Which of the following additions to the MCTS may improve the algorithm’s performance (multiple answers may apply):

1

מיין לפי

The decision transformer employs a Transformer-decoder architecture

1

true

מיין לפי

13. [Experience replay] In which of the following scenarios is experience replay not likely to be effective/required when using DQN (multiple answers may apply):

1

מיין לפי

a) One of the challenges in using autoencoders to model the dynamics of an environment is our inability to be certain that the important details are what’s being modeled

1

true

מיין לפי

16. [Imitation learning] Which of the following statement are true regarding Dagger (multiple answers may apply)

1

מיין לפי

b) In experience replay, selecting trajectories with high TD-errors is likely to improve the performance of our DRL agent

1

true

מיין לפי

a) When training an agent to play Atari games, it might be beneficial to use a stochastic policy at train time and a deterministic policy at test time

1

true

מיין לפי

b) When training an agent to play a two-player game against other agents trained by other people (e.g., chess, rock-paper-scissors), it might be useful to use a stochastic policy both at train and test time

1

true

מיין לפי

1. [Meta-learning] Which of the following statements are true regarding the Simple Neural Attentive Meta-Learner (SNAIL) architecture (multiple answers may apply):

1

מיין לפי

Discuss, Learn and be Happy דיון בשאלות

[Advanced exploration methods] Which of the following statements is not true? Answer based on the works studied in the course (multiple answers may apply):

The decision transformer employs a Transformer-decoder architecture

13. [Experience replay] In which of the following scenarios is experience replay not likely to be effective/required when using DQN (multiple answers may apply):

a) One of the challenges in using autoencoders to model the dynamics of an environment is our inability to be certain that the important details are what’s being modeled

16. [Imitation learning] Which of the following statement are true regarding Dagger (multiple answers may apply)

b) In experience replay, selecting trajectories with high TD-errors is likely to improve the performance of our DRL agent

a) When training an agent to play Atari games, it might be beneficial to use a stochastic policy at train time and a deterministic policy at test time

b) When training an agent to play a two-player game against other agents trained by other people (e.g., chess, rock-paper-scissors), it might be useful to use a stochastic policy both at train and test time

1. [Meta-learning] Which of the following statements are true regarding the Simple Neural Attentive Meta-Learner (SNAIL) architecture (multiple answers may apply):