On the sample complexity of reinforcement learning by