Open IE Research

What is Open Information Extraction (OpenIE)?

OpenIE is the Natural Language Processing task of extracting machine-readable information from free text (e.g. books, news, etc). Concretely, most OpenIE systems generate <subject, relation, object> tuples, e.g. “Barack Obama was President of the United States.” becomes <Barack Obama, was President of, the United States> or <Barack Obama, was, President of the United States>. OpenIE is useful for knowledge-base construction, fact-checking, and other downstream NLP applications (Stanovsky et al., 2015).

What are the issues with existing systems?

Current systems fail to extract tuples for implicit relations, e.g. “U.S. President Barack Obama said yesterday…” should imply <Barack Obama, is , U.S President>. While some parse-based methods can handle simple implicit relations like this, longer sentences pose problems.

What I’ve been workin on

I’ve created several tools towards implicit OpenIE and am currently working on a paper.

  1. Converted the SQuAD and NewsQA datasets to OpenIE datasets https://github.com/NPCai/Squadie
  2. Created an attention seq2seq model https://github.com/NPCai/Nopie
  3. Catalogued and examined previous works https://github.com/NPCai/Open-IE-Papers
  4. Updated a previously existing benchmark to Python 3 and improved it https://github.com/NPCai/oie-benchmark
  5. Other misc tools at https://github.com/NPCai

I’ll be doing an Independent Study at Penn Engineering Fall 2018 semester.