b) Experience replay is generally less effective in stochastic environments compared to deterministic environments due to the increased variance in sampled transitions
b) In the approach “Curiosity-driven Exploration by Self-supervised Prediction”, in environments where all states are equally novel or predictable, this curiosity-driven approach would behave identically to a random exploration strategy.
b) The jointly-learned state-action embedding approach allows for more efficient exploration by leveraging similarities between actions in the embedding space