Presentation + Paper
7 June 2024 Exploiting explainability for reinforcement learning model assurance
Alexander Tapley, Joseph Weissman
Author Affiliations +
Abstract
Explainable Reinforcement Learning (XRL) techniques help to provide insight into the decision-making process of a Deep Reinforcement Learning (DRL) model and can be used to identify vulnerabilities and weaknesses within the policy prior to deployment, but these techniques also expose critical policy information and associated vulnerabilities that adversaries can exploit to develop more effective and efficient adversarial attacks against the trained policy. This paper introduces the ARLIN (Assured Reinforcement Learning Model Interrogation) Toolkit, an open-source Python library that identifies potential vulnerabilities and critical points within trained DRL models through detailed, human-interpretable explainability outputs. To illustrate ARLIN’s effectiveness, we provide explainability visualizations and vulnerability analysis for a publicly available DRL model and demonstrate how the outputs generated by ARLIN can be exploited to successfully reduce the overall performance of the model while limiting overall detectability. The open-source code repository is available for download at https://github.com/mitre/arlin.
Conference Presentation
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Alexander Tapley and Joseph Weissman "Exploiting explainability for reinforcement learning model assurance", Proc. SPIE 13054, Assurance and Security for AI-enabled Systems, 130540G (7 June 2024); https://doi.org/10.1117/12.3013090
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visualization

Failure analysis

Machine learning

Visual analytics

Performance modeling

Visual process modeling

Data modeling

Back to Top