[Advanced exploration methods] Which of the following statements is not true? Answer based on the works studied in the course (multiple answers may apply):
[MCTS] Assume you are using Monte Carlo Tree Search to develop a policy for a two-player game. Also, whenever a player takes an action, there is a 30% chance of another action being chosen in random. Which of the following additions to the MCTS may improve the algorithm’s performance (multiple answers may apply):
13. [Experience replay] In which of the following scenarios is experience replay not likely to be effective/required when using DQN (multiple answers may apply):
a) One of the challenges in using autoencoders to model the dynamics of an environment is our inability to be certain that the important details are what’s being modeled
b) When training an agent to play a two-player game against other agents trained by other people (e.g., chess, rock-paper-scissors), it might be useful to use a stochastic policy both at train and test time
1. [Meta-learning] Which of the following statements are true regarding the Simple Neural Attentive Meta-Learner (SNAIL) architecture (multiple answers may apply):