Exploiting explainability for reinforcement learning model assurance

Alexander Tapley; Joseph Weissman

doi:10.1117/12.3013090

7 June 2024 Exploiting explainability for reinforcement learning model assurance

Alexander Tapley, Joseph Weissman

Proceedings Volume 13054, Assurance and Security for AI-enabled Systems; 130540G (2024) https://doi.org/10.1117/12.3013090
Event: SPIE Defense + Commercial Sensing, 2024, National Harbor, Maryland, United States

Abstract

Explainable Reinforcement Learning (XRL) techniques help to provide insight into the decision-making process of a Deep Reinforcement Learning (DRL) model and can be used to identify vulnerabilities and weaknesses within the policy prior to deployment, but these techniques also expose critical policy information and associated vulnerabilities that adversaries can exploit to develop more effective and efficient adversarial attacks against the trained policy. This paper introduces the ARLIN (Assured Reinforcement Learning Model Interrogation) Toolkit, an open-source Python library that identifies potential vulnerabilities and critical points within trained DRL models through detailed, human-interpretable explainability outputs. To illustrate ARLIN’s effectiveness, we provide explainability visualizations and vulnerability analysis for a publicly available DRL model and demonstrate how the outputs generated by ARLIN can be exploited to successfully reduce the overall performance of the model while limiting overall detectability. The open-source code repository is available for download at https://github.com/mitre/arlin.

Conference Presentation

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Alexander Tapley and Joseph Weissman "Exploiting explainability for reinforcement learning model assurance", Proc. SPIE 13054, Assurance and Security for AI-enabled Systems, 130540G (7 June 2024); https://doi.org/10.1117/12.3013090

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available