Efficient Multi Task Reinforcement Learning Via Selective Behavior
Grace Zhang Ayush Jain Injune Hwang Shao Hua Sun Joseph Lim To this end, we propose a novel mtrl method, q switch mixture of policies (qmp), that learns to selectively shares exploratory behavior between tasks by using a mixture of policies based on estimated discounted returns to gather training data. To this end, we present a novel framework called cross task policy guidance (ctpg), which trains a guide policy for each task to select the behavior policy interacting with the environment from all tasks’ control policies, generating better training trajectories.
Efficient Multi Task Reinforcement Learning Via Selective Behavior Sharing We empirically demonstrate how behavior sharing improves sample efficiency and final performance on manipulation and navigation mtrl tasks and is even complementary to parameter sharing. In this paper, we present a knowledge transfer based multi task deep reinforcement learning framework (ktm drl) for continuous control, which enables a single drl agent to achieve. This work studies the benefit of sharing representations among tasks to enable the effective use of deep neural networks in multi task reinforcement learning, and extends the well known finite time bounds of approximate value iteration to the multi task setting. To address such limitation, a more flexible mtrl framework is needed, where an agent can selectively learn to share behaviors from different tasks only when the optimal task behaviors coincide and avoid sharing when they conflict.
Efficient Multi Task Reinforcement Learning Via Selective Behavior Sharing This work studies the benefit of sharing representations among tasks to enable the effective use of deep neural networks in multi task reinforcement learning, and extends the well known finite time bounds of approximate value iteration to the multi task setting. To address such limitation, a more flexible mtrl framework is needed, where an agent can selectively learn to share behaviors from different tasks only when the optimal task behaviors coincide and avoid sharing when they conflict. Abstract: qmp is a multi task reinforcement learning approach that shares behaviors between tasks using a mixture of policies for off policy data collection. we show that using the q function as a switch for this mixture is guaranteed to improve sample efficiency. We empirically demonstrate how behavior sharing improves sample efficiency and final performance on manipulation and navigation mtrl tasks and is even complementary to parameter sharing. Multitask reinforcement learning (mtrl) holds potential for building general purpose agents, enabling them to generalize across a variety of tasks. however, mtr. To this end, we propose a novel mtrl method, q switch mixture of policies (qmp), that learns to selectively shares exploratory behavior between tasks by using a mixture of policies based on estimated discounted returns to gather training data.
Comments are closed.