Personal assistants like Apple’s Siri accomplish tasks through natural language commands.
Copyright by venturebeat.com
However, their underlying components often rely on supervised algorithms requiring large amounts of hand-annotated training data. In an attempt to reduce the time and effort taken to collect this data, researchers at Apple developed a framework that leverages user engagement signals to automatically create data-augmenting labels. They report that when incorporated using strategies like multi-task learning and validation with an external knowledge base, the annotated data significantly improve accuracy in a production system.
“We believe this is the first use of user engagement signals to help generate training data for a sequence labeling task on a large scale, and can be applied in practical settings to speed up new feature deployment when little human-annotated data is available,” wrote the researchers in a preprint paper . “Moreover … user engagement signals can help us to identify where the digital assistant needs improvement by learning from its own mistakes.”
The researchers used a range of heuristics to identify behaviors indicating either positive or negative engagement. A few included tapping on content to engage with it further (a positive response), listening to a song for a long duration (another positive response), or interrupting content provided by an intelligent assistant and manually selecting different content (a negative response). Those signals were selectively harvested in a “privacy-preserving manner” to automatically produce ground truth annotations, and they were subsequently combined with coarse-grained labels provided by human annotators.
In order to incorporate the coarse-grained labels and the inferred fine-grained labels into an model, the paper’s coauthors devised a multi-task learning framework that treats coarse-grained and fine-grained entity labeling as two tasks. Additionally, they incorporated an external knowledge base validator consisting of entities and their relations. Given the prediction “something” as a music title and “the Beatles” as a music artist for the query “Play something by the Beatles,” the validator would perform a lookup for the top label alternatives and send them to a component that’d re-rank the predictions and return the best alternative.
The researchers conducted two separate test sets to evaluate the tasks performed by the multi-task model, which they compiled by randomly sampling from the production system and hand-annotating with ground truth labels. They say that across 21 model runs, adding 260,000 training examples “consistently” reduced the coarse-grained entity error rate on a prediction task compared with the baseline for all amounts of human-annotated data. Moreover, they report that adding weakly supervised fine-grained data had a larger impact when there was a relatively small amount of human-annotated data (5,000 examples). Lastly, they report that for examples where any of the top model hypotheses passed the knowledge base validator, the fine-grained entity error rate dropped by around 50%. […]