Discuss, Learn and be Happy דיון בשאלות

help brightness_4 brightness_7 format_textdirection_r_to_l format_textdirection_l_to_r

[Advanced exploration methods] Which of the following statements is not true? Answer based on the works studied in the course (multiple answers may apply):

1
done
done
by
מיין לפי

[MCTS] Assume you are using Monte Carlo Tree Search to develop a policy for a two-player game. Also, whenever a player takes an action, there is a 30% chance of another action being chosen in random. Which of the following additions to the MCTS may improve the algorithm’s performance (multiple answers may apply):

1
done
done
sentiment_very_satisfied
by
מיין לפי

The decision transformer employs a Transformer-decoder architecture

1
by
מיין לפי

13. [Experience replay] In which of the following scenarios is experience replay not likely to be effective/required when using DQN (multiple answers may apply):

1
done
by
מיין לפי

a) One of the challenges in using autoencoders to model the dynamics of an environment is our inability to be certain that the important details are what’s being modeled

1
by
מיין לפי

16. [Imitation learning] Which of the following statement are true regarding Dagger (multiple answers may apply)

1
done
done
by
מיין לפי

b) In experience replay, selecting trajectories with high TD-errors is likely to improve the performance of our DRL agent

1
by
מיין לפי

a) When training an agent to play Atari games, it might be beneficial to use a stochastic policy at train time and a deterministic policy at test time

1
by
מיין לפי

b) When training an agent to play a two-player game against other agents trained by other people (e.g., chess, rock-paper-scissors), it might be useful to use a stochastic policy both at train and test time

1
sentiment_very_satisfied
by
מיין לפי

1. [Meta-learning] Which of the following statements are true regarding the Simple Neural Attentive Meta-Learner (SNAIL) architecture (multiple answers may apply):

1
done
by
מיין לפי