Once upon a time (bear with me if you’ve heard this one), there was a company which made a significant advance in artificial intelligence. Given their incredibly sophisticated new system, they started to put it to ever-wider uses, asking it to optimize their business for everything from the lofty to the mundane.

And one day, the CEO wanted to grab a paperclip to hold some papers together, and found there weren’t any in the tray by the printer. “Alice!” he cried (for Alice was the name of his machine learning lead) “Can you tell the damned AI to make sure we don’t run out of paperclips again?” Walt Disney and his studio produced the classic illustration of the intern in action, in Fantasia (1940). Alice said “Sure,” and assigned the task to Bob the Intern, who proceeded to follow all of the rules of machine learning he had been taught at school.

Finding the missing Paperclips

He accessed the office management database to find out when each printer’s paper clip store had been refilled, determined that the ML system had already been helpfully instrumented as able to do everything from placing purchase orders to filing instructions to have paperclips delivered to a particular printer, and instructed it to build a model of when paperclip orders would happen, and ensure that the number of paperclips available was always maximized. Since he didn’t have the appropriate credentials to initiate purchase orders himself (he was, after all, only an Intern), Bob asked Alice for credentials; she asked the CEO, “Are you really sure this is what you want our advanced machine learning system to be spending its time on?” and, when he gruffly said “Yes, dammit!” she suggested he use his own credentials for the purchase orders, then. In retrospect, both Alice and the CEO should have been a little more careful in trusting this code.

The Paperclip Maximizer

You see, the Paperclip Maximizer was a fairly sophisticated AI; it could train itself not only on the office supplies database, but (thanks to its very flexible development environment) could automatically look for any other signal available to it to try to achieve its stated goal. But for all its sophistication, it understood only the simple objective that had been programmed into it: it must at all costs maximize the number of paperclips.

What could possibly go wrong? […]