Discuss, Learn and be Happy דיון בשאלות

help brightness_4 brightness_7 format_textdirection_r_to_l format_textdirection_l_to_r

b) AlphaGoZero employs a neural network to guide its decision making process, selecting the next branches in the tree based on probabilities assigned by the network

1
by
מיין לפי

4. [Large state & action spaces] Which of the following statements is correct regarding the Value Iteration Network (VIN) algorithm:

1
done
by
מיין לפי

a) Imitation learning is more likely to be effective (compared to other forms of RL) when the transition matrix and/or the rewards are stochastic

1
by
מיין לפי

In Dagger with coaching, the algorithm will initially rely on its own policy, then will gradually incorporate the expert policy over time

1
by
מיין לפי

7. [Imitation learning] Which of the following IS NOT an advantage of behavior cloning compared to reinforcement learning (multiple answers may apply)?

1
done
done
by
מיין לפי

b) One of the more challenging use-cases for the Decision Transformer is when the return-to-go becomes negative (particularly when still far from the end of the trajectory).

1
by
מיין לפי

10. [General] In which of the following setups, Sarsa is likely to outperform Q-learning (multiple answers may apply):

1
done
done
done
by
מיין לפי

a) The use of contextual bandits by the action-elimination network (AEN) is important, because it enables the model to better infer the utility (i.e., “value”) of multiple actions at once.

1
by
מיין לפי

b) Decoupling the action elimination (by the AEN) from the action selection (by the DRL agent) reduces the risk of the model exploring only a small subset of actions.

1
by
מיין לפי

13. [Meta-learning] Which of the following statements best characterizes the Model Agnostic Meta-Learning (MAML) algorithm:

1
done
by
מיין לפי