With the rise of adversarial AI, government researchers are looking for ways to automatically inspect artificial intelligence and machine learning systems to see if they’ve been tampered with.
Adversarial AI attacks that insert information or images into machine-learning training data seek to trick the system into incorrectly classifying what has been presented. If a system is being trained to recognize traffic signs, for example, it would learn from hundreds of labeled images of the stop signs and speed limit signs. An adversary could insert into the training database a few images of stop signs with yellow sticky notes attached that were labeled as 35 mph speed limit signs. An autonomous driving system trained on that data seeing a stops sign with a sticky note on it would be triggered to interpret that image as a speed limit sign and drive right through the stop sign.
The Army Research Office and the Intelligence Advanced Research Projects Activity are investigating techniques to spot and stop these Trojans in AI systems. Given the impossibility of cleaning and securing the entire training data pipeline, the broad agency announcement for the TrojAI program is looking to develop software to automatically inspect AI and predict if it has a Trojan.
Initially, selected performers will be working as a team with AI systems that classify small images, but the two-year program may expand to systems that classify audio and text or perform other tasks such as question-answering or game playing. As the program continues, the difficulty of identifying Trojans will be increased by changing aspects of the challenge such as the amount of test data, the rarity of Trojans, the variety of neural-network architectures and the variability of the Trojan triggers.
Performers will have access to the AI source code, architecture and compiled binary, and possibly a small number of examples of valid data. The program requires continuous software development, with development teams delivering containerized software that detects which AIs have been subject to a Trojan attack that causes misclassification. The software’s source code and documentation will be posted to an open source site such as Github to permit free and effective use by the public. […]