DRL

b) AlphaGoZero employs a neural network to guide its decision making process, selecting the next branches in the tree based on probabilities assigned by the network

1

true

מיין לפי

4. [Large state & action spaces] Which of the following statements is correct regarding the Value Iteration Network (VIN) algorithm:

1

מיין לפי

a) Imitation learning is more likely to be effective (compared to other forms of RL) when the transition matrix and/or the rewards are stochastic

1

true

מיין לפי

In Dagger with coaching, the algorithm will initially rely on its own policy, then will gradually incorporate the expert policy over time

1

true

מיין לפי

7. [Imitation learning] Which of the following IS NOT an advantage of behavior cloning compared to reinforcement learning (multiple answers may apply)?

1

מיין לפי

b) One of the more challenging use-cases for the Decision Transformer is when the return-to-go becomes negative (particularly when still far from the end of the trajectory).

1

true

מיין לפי

10. [General] In which of the following setups, Sarsa is likely to outperform Q-learning (multiple answers may apply):

1

מיין לפי

a) The use of contextual bandits by the action-elimination network (AEN) is important, because it enables the model to better infer the utility (i.e., “value”) of multiple actions at once.

1

true

מיין לפי

b) Decoupling the action elimination (by the AEN) from the action selection (by the DRL agent) reduces the risk of the model exploring only a small subset of actions.

1

true

מיין לפי

13. [Meta-learning] Which of the following statements best characterizes the Model Agnostic Meta-Learning (MAML) algorithm:

1

מיין לפי

Discuss, Learn and be Happy דיון בשאלות

b) AlphaGoZero employs a neural network to guide its decision making process, selecting the next branches in the tree based on probabilities assigned by the network

4. [Large state & action spaces] Which of the following statements is correct regarding the Value Iteration Network (VIN) algorithm:

a) Imitation learning is more likely to be effective (compared to other forms of RL) when the transition matrix and/or the rewards are stochastic

In Dagger with coaching, the algorithm will initially rely on its own policy, then will gradually incorporate the expert policy over time

7. [Imitation learning] Which of the following IS NOT an advantage of behavior cloning compared to reinforcement learning (multiple answers may apply)?

b) One of the more challenging use-cases for the Decision Transformer is when the return-to-go becomes negative (particularly when still far from the end of the trajectory).

10. [General] In which of the following setups, Sarsa is likely to outperform Q-learning (multiple answers may apply):

a) The use of contextual bandits by the action-elimination network (AEN) is important, because it enables the model to better infer the utility (i.e., “value”) of multiple actions at once.

b) Decoupling the action elimination (by the AEN) from the action selection (by the DRL agent) reduces the risk of the model exploring only a small subset of actions.

13. [Meta-learning] Which of the following statements best characterizes the Model Agnostic Meta-Learning (MAML) algorithm: