8

The following paper combines recurrent neural nets for vision with methods from reinforcement learning research:

https://proceedings.neurips.cc/pape...

Apparently an agent learned to catch a ball 85% of the time, without being explicitly told to track the ball. The RL algorithm rewarded the agent *only* for successfully catching the ball. The system itself used this reward signal to set its *own* policy/goal, which was used to guide it toward the goal of tracking the ball itself--all on its own.

Behold, the very infancy of the paperclip maximizer problem.

Comments
Add Comment