To maximize the total reward it receives over time.
To minimize the total reward it receives over time.
To take actions randomly.
To take actions based on complete information.
All courses were automatically generated using OpenAI's GPT-3. Your feedback helps us improve as we cannot manually review every course. Thank you!