a) When training an agent to play Atari games, it might be beneficial to use a stochastic policy at train time and a deterministic policy at test time
a) When training an agent to play Atari games, it might be beneficial to use a stochastic policy at train time and a deterministic policy at test time
* השאלה נוספה בתאריך: 28-02-2025