Can We Survive without Labelled Data in NLP? Transfer Learning for Open Information Extraction

Published in Appl. Sci (2020), 10, 5758, 2020

Recommended citation: Injy Sarhan and Marco R. Spruit. “Can We Survive without Labelled Data in NLP? Transfer Learning for Open Information Extraction.”, Appl. Sci (2020), 10, 5758. https://www.mdpi.com/2076-3417/10/17/5758

Various tasks in natural language processing (NLP) suffer from lack of labelled training data, which deep neural networks are hungry for. In this paper, we relied upon features learned to generate relation triples from the open information extraction (OIE) task. First, we studied how transferable these features are from one OIE domain to another, such as from a news domain to a bio-medical domain. Second, we analyzed their transferability to a semantically related NLP task, namely, relation extraction (RE). We thereby contribute to answering the question: can OIE help us achieve adequate NLP performance without labelled data? Our results showed comparable performance when using inductive transfer learning in both experiments by relying on a very small amount of the target data, wherein promising results were achieved. When transferring to the OIE bio-medical domain, we achieved an F-measure of 78.0%, only 1% lower when compared to traditional learning. Additionally, transferring to RE using an inductive approach scored an F-measure of 67.2%, which was 3.8% lower than training and testing on the same task. Hereby, our analysis shows that OIE can act as a reliable source task.

Download paper here.