Discuss, Learn and be Happy דיון בשאלות

help brightness_4 brightness_7 format_textdirection_r_to_l format_textdirection_l_to_r

In reinforcement learning, which component is responsible for updating the policy?

1
done
by
מיין לפי

2. Which of the following is a major challenge in reinforcement learning compared to supervised learning?

1
done
by
מיין לפי

3. Why do DQNs use experience replay?

1
done
by
מיין לפי

4. Policy gradients are particularly useful in

1
done
by
מיין לפי

5. Which of the following statements about DAgger is correct?

1
done
by
מיין לפי

6. Which problem does imitation learning aim to solve?

1
done
by
מיין לפי

7. What is the main goal of multi-arm bandit algorithms?

1
done
done
by
מיין לפי

8. Which exploration strategy assigns probabilities to actions based on their likelihood of being optimal?

1
done
by
מיין לפי

9. In Monte Carlo Tree Search (MCTS), what does the Upper Confidence Bound (UCB) formula help with?

1
done
by
מיין לפי

10. AlphaGo uses which key techniques?

1
done
done
by
מיין לפי