KEYWORDS: Deep learning, Information assurance, Visualization, Failure analysis, Visual analytics, Machine learning, Visual process modeling, Performance modeling, Data modeling
Explainable Reinforcement Learning (XRL) techniques help to provide insight into the decision-making process of a Deep Reinforcement Learning (DRL) model and can be used to identify vulnerabilities and weaknesses within the policy prior to deployment, but these techniques also expose critical policy information and associated vulnerabilities that adversaries can exploit to develop more effective and efficient adversarial attacks against the trained policy. This paper introduces the ARLIN (Assured Reinforcement Learning Model Interrogation) Toolkit, an open-source Python library that identifies potential vulnerabilities and critical points within trained DRL models through detailed, human-interpretable explainability outputs. To illustrate ARLIN’s effectiveness, we provide explainability visualizations and vulnerability analysis for a publicly available DRL model and demonstrate how the outputs generated by ARLIN can be exploited to successfully reduce the overall performance of the model while limiting overall detectability. The open-source code repository is available for download at https://github.com/mitre/arlin.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.