Reinforcement Learning im Cournot Duopol by Sandro Bahn