PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Explainable Reinforcement Learning (XRL) techniques help to provide insight into the decision-making process of a Deep Reinforcement Learning (DRL) model and can be used to identify vulnerabilities and weaknesses within the policy prior to deployment, but these techniques also expose critical policy information and associated vulnerabilities that adversaries can exploit to develop more effective and efficient adversarial attacks against the trained policy. This paper introduces the ARLIN (Assured Reinforcement Learning Model Interrogation) Toolkit, an open-source Python library that identifies potential vulnerabilities and critical points within trained DRL models through detailed, human-interpretable explainability outputs. To illustrate ARLIN’s effectiveness, we provide explainability visualizations and vulnerability analysis for a publicly available DRL model and demonstrate how the outputs generated by ARLIN can be exploited to successfully reduce the overall performance of the model while limiting overall detectability. The open-source code repository is available for download at https://github.com/mitre/arlin.
Conference Presentation
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Alexander Tapley andJoseph Weissman
"Exploiting explainability for reinforcement learning model assurance", Proc. SPIE 13054, Assurance and Security for AI-enabled Systems, 130540G (7 June 2024); https://doi.org/10.1117/12.3013090
ACCESS THE FULL ARTICLE
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
The alert did not successfully save. Please try again later.
Alexander Tapley, Joseph Weissman, "Exploiting explainability for reinforcement learning model assurance," Proc. SPIE 13054, Assurance and Security for AI-enabled Systems, 130540G (7 June 2024); https://doi.org/10.1117/12.3013090