CRII: III: Capturing Dynamism in Causal Relationships: A New Paradigm for Relationship Extraction from Text
Sunandan Chakraborty
项目主持机构Indiana University
英文摘要Text mining made important advances in methods to convert vast and unstructured text data into knowledge. However, the current paradigm of relationship extraction has one major limitation: it models snapshots of information but fails to capture the fundamentally dialogic and dynamic nature of knowledge: conflicting findings, inconsistent discoveries, refutations, contradictions, reinforcements or confirmations, all changing over time. This project aims to capture such fundamental dynamics of knowledge, specifically focusing on causal relationships. Whereas numerous articles, including academic articles, present knowledge and relationships that express causality, such relationships are not static and can change over time due to changing conditions. The objective of this project is to identify cues of causal knowledge from text data, quantify the strength of the causal relationship, and model its dynamics over changing conditions. Ultimately, the project aims at modelling a more holistic view of the knowledge extracted from text. As text data is extensively used by researchers and practitioners from different domains of national importance, including, medicine and health, economics, public policy, journalism, the results of this project seek to provide the foundation to offer practitioners new ways to understand the evolving nature of the causal relationships present in large text datasets. Specifically, the novel approaches developed in the project will be applied to explore public health data to determine how changing climatic, political, economic conditions may affect the mental and physical health of the population in different geographic areas. In addition, there will be various educational activities as part of this project - emerging and related topics from this project will be included in the curricula of various courses in the applied data science master’s program; promote undergraduate research, specifically, recruit students to work in the project who are from underrepresented and economically disadvantaged communities; organize a research workshop to encourage participation of high school students in STEM research.

The project activities include the development of a novel model of causal relationship extraction that leverages a unified deep learning framework combining both semantic and syntax cues. This approach will utilize the key syntactical features of a sentence represented by the grammar relationships between noun, verbs and other parts of speech through graphical or tree-like models. This work will determine whether the sentence features a structure that signals causality. Moreover, the sequential component of the model will utilize the semantics and identify the influence of certain words in the sentence to characterize the nature of the causal relationship expressed in the text. This task will capture the strength of the relationship (e.g., using cues like "extremely likely", "definitely"), any supporting or opposing evidences (e.g., "will lead to" or "does not lead to"), and will identify conditional cues (e.g., "in the presence of") etc. Quantifying such qualitative properties will lead to the second innovation of this project – causal distance. Causal distance is a time-variant metric that will denote the magnitude of causality between two entities as well as capture the dynamism of the relationship by modifying itself over time with changing conditions or new evidences. Collectively, the advances pursued in this projects will further enhance our understanding of the novel computational approaches needed to unearth and reason on cues of causal relationships embedded in large text data sets. The outcomes of this project, such as datasets, source code, final software, results and publications will be shared via publicly accessible URLs and online code repositories. Additionally, all the project resources and outcomes will be made available on the project website.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
项目类型Standard Grant
GB/T 7714
Sunandan Chakraborty.CRII: III: Capturing Dynamism in Causal Relationships: A New Paradigm for Relationship Extraction from Text.2020.
[Sunandan Chakraborty]的文章
[Sunandan Chakraborty]的文章
[Sunandan Chakraborty]的文章
