Lang Cao

2024

Adapted Large Language Models for Medical Literature Mining [Paper] [Code]

Lang Cao, Zifeng Wang, Qiao Jin, Zhiyong Lu, Jimeng Sun.
Under Review (Nature Medicine).
Abstract
The synthesis of clinical evidence from medical literature is fundamental to evidence-based medicine, necessitating comprehensive search and screening of relevant papers and clinical trials. Current datasets for benchmarking machine learning (ML) capabilities in literature mining are notably small in scale, often encompassing only hundreds of review papers with associated studies, and lack a focus on medicine. This paper introduces polestar, the most extensive dataset created specifically to automate the search and screening of medical literature. Compiled from systematic reviews in PubMed and Cochrane, polestar spans a wide range of evidence-based medicine topics. It features over 10,000 reviews, more than 100,000 involved studies, and upwards of 100,000 clinical trials for the literature search task. Furthermore, it includes over 1 million annotated pairs of studies and reviews for literature screening, using two major literature collections: PubMed for academic papers and ClinicalTrials.Gov for clinical trials. We evaluate polestar with a series of ML baselines to showcase the current state of automated medical literature mining. By releasing polestar and its benchmarks, we aim to foster further research and development in this vital area, ultimately enhancing the efficiency and accuracy of synthesizing clinical evidence.

Accelerating Clinical Evidence Synthesis with Large Language Models [Paper]

Zifeng Wang, Lang Cao, Benjamin Danek, Qiao Jin, Zhiyong Lu, Jimeng Sun.
Under Review (npj Digital Medicine).
Abstract
Synthesizing clinical evidence largely relies on systematic reviews of clinical trials and retrospective analyses from medical literature. However, the rapid expansion of publications presents challenges in efficiently identifying, summarizing, and updating clinical evidence. Here, we introduce TrialMind, a generative artificial intelligence (AI) pipeline for facilitating human-AI collaboration in three crucial tasks for evidence synthesis: study search, screening, and data extraction. To assess its performance, we chose published systematic reviews to build the benchmark dataset, named TrialReviewBench, which contains 100 systematic reviews and the associated 2,220 clinical studies. Our results show that TrialMind excels across all three tasks. In study search, it generates diverse and comprehensive search queries to achieve high recall rates (Ours 0.711-0.834 v.s. Human baseline 0.138-0.232). For study screening, TrialMind surpasses traditional embedding-based methods by 30% to 160%. In data extraction, it outperforms a GPT-4 baseline by 29.6% to 61.5%. We further conducted user studies to confirm its practical utility. Compared to manual efforts, human-AI collaboration using TrialMind yielded a 71.4% recall lift and 44.2% time savings in study screening and a 23.5% accuracy lift and 63.4% time savings in data extraction. Additionally, when comparing synthesized clinical evidence presented in forest plots, medical experts favored TrialMind's outputs over GPT-4's outputs in 62.5% to 100% of cases. These findings show the promise of LLM-based approaches like TrialMind to accelerate clinical evidence synthesis via streamlining study search, screening, and data extraction from medical literature, with exceptional performance improvement when working with human experts.

KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge [Paper] [Code]

Pengcheng Jiang, Lang Cao, Cao Xiao, Parminder Bhatia, Jimeng Sun, Jiawei Han.
NeurIPS 2024, The Thirty-Eighth Annual Conference on Neural Information Processing Systems.
Abstract
Knowledge Graph Embedding (KGE) techniques are crucial in learning compact representations of entities and relations within a knowledge graph, facilitating efficient reasoning and knowledge discovery. While existing methods typically focus either on training KGE models solely based on graph structure or fine-tuning pre-trained language models with classification data in KG, KG-FIT leverages LLM-guided refinement to construct a semantically coherent hierarchical structure of entity clusters. By incorporating this hierarchical knowledge along with textual information during the fine-tuning process, KG-FIT effectively captures both global semantics from the LLM and local semantics from the KG. Extensive experiments on the benchmark datasets FB15K-237, YAGO3-10, and PrimeKG demonstrate the superiority of KG-FIT over state-of-the-art pre-trained language model-based methods, achieving improvements of 14.4%, 13.5%, and 11.9% in the Hits@10 metric for the link prediction task, respectively. Furthermore, KG-FIT yields substantial performance gains of 12.6%, 6.7%, and 17.7% compared to the structure-based base models upon which it is built. These results highlight the effectiveness of KG-FIT in incorporating open-world knowledge from LLMs to significantly enhance the expressiveness and informativeness of KG embeddings.

2023

AutoRD: An Automatic and End-to-end System for Rare Disease Knowledge Graph Construction Based on Ontologies-enhanced Large Language Models [Paper] [Code]

Lang Cao, Adam Cross, Jimeng Sun.
JMIR Medical Informatics.
Abstract
Objectives: We aim to build an automatic and end-to-end system which can extract information about rare diseases from text and building knowledge graph. In the system, large language models give the system strong language analysis ability, while medical ontologies make up the medical knowledge shortage of large language models. We investigate the performance of our system in multiple aspects and present the strengths and limitations of this system. Materials and Methods: The experimental data is from the public dataset RareDis. We develop a system called AutoRD, which comprises medical ontologies and large language models. The system is a pipeline structure: data preprocessing, entity extraction, relation extraction, entity calibration, and knowledge graph construction. We quantitatively evaluate our system in entity extraction and relation extraction. We also show some results of knowledge graph construction. Results: AutoRD achieves an overall F1 score of 47.3% with an improvement of 0.8% compared to the fine-tuning model and a 14.4% improvement compared to the base LLM. Our qualitative experiment also demonstrates that the performance in constructing the knowledge graph is commendable. Several designs, including the incorporation of ontologies-enhanced LLMs, contribute to the improvement of AutoRD. Discussion: AutoRD demonstrates superior performance compared to other methods, demonstrating the potential of LLM applications in the healthcare field. Conclusion: We built AutoRD, an automatic, end-to-end system for extracting rare disease information from text to build knowledge graphs. It uses ontologies-enhanced LLMs for a robust medical knowledge base. The superior performance of AutoRD is validated by experimental evaluations, demonstrating potential of large language models in healthcare.

Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism [Paper] [Code]

Lang Cao (Independent Research).
EMNLP 2024 Main Conference, 2024 Conference on Empirical Methods in Natural Language Processing.
Abstract
Large language models (LLMs) have demonstrated impressive language understanding and generation capabilities, enabling them to answer a wide range of questions across various domains. However, these models are not flawless and often produce responses that contain errors or misinformation. These inaccuracies, commonly referred to as hallucinations, render LLMs unreliable and even unusable in many scenarios. In this paper, our focus is on mitigating the issue of hallucination in LLMs, particularly in the context of question-answering. Instead of attempting to answer all questions, we explore a refusal mechanism that instructs LLMs to refuse to answer challenging questions in order to avoid errors. We then propose a simple yet effective solution called Learn to Refuse (L2R), which incorporates the refusal mechanism to enable LLMs to recognize and refuse to answer questions that they find difficult to address. To achieve this, we utilize a structured knowledge base to represent all the LLM's understanding of the world, enabling it to provide traceable gold knowledge. This knowledge base is separate from the LLM and initially empty. It can be filled with validated knowledge and progressively expanded. When an LLM encounters questions outside its domain, the system recognizes its knowledge scope and determines whether it can answer the question independently. Additionally, we introduce a method for automatically and efficiently expanding the knowledge base of LLMs. Through qualitative and quantitative analysis, we demonstrate that our approach enhances the controllability and reliability of LLMs.

DiagGPT: An LLM-based Chatbot with Automatic Topic Management for Task-Oriented Dialogue [Paper] [Code]

Lang Cao (Independent Research).
Technical Report.
Abstract
Large Language Models (LLMs), such as ChatGPT, are becoming increasingly sophisticated, demonstrating capabilities that closely resemble those of humans. These AI models are playing an essential role in assisting humans with a wide array of tasks in daily life. A significant application of AI is its use as a chat agent, responding to human inquiries across various domains. Current LLMs have shown proficiency in answering general questions. However, basic question-answering dialogue often falls short in complex diagnostic scenarios, such as legal or medical consultations. These scenarios typically necessitate Task-Oriented Dialogue (TOD), wherein an AI chat agent needs to proactively pose questions and guide users towards specific task completion. Previous fine-tuning models have underperformed in TOD, and current LLMs do not inherently possess this capability. In this paper, we introduce DiagGPT (Dialogue in Diagnosis GPT), an innovative method that extends LLMs to TOD scenarios. Our experiments reveal that DiagGPT exhibits outstanding performance in conducting TOD with users, demonstrating its potential for practical applications.

GraphReason: Enhancing Reasoning Capabilities of Large Language Models through A Graph-Based Verification Approach [Paper] [Code]

Lang Cao (Independent Research).
Natural Language Reasoning and Structured Explanations Workshop @ ACL 2024.
Abstract
Large Language Models (LLMs) have showcased impressive reasoning capabilities, particularly when guided by specifically designed prompts in complex reasoning tasks such as math word problems. These models typically solve tasks using a chain-of-thought approach, which not only bolsters their reasoning abilities but also provides valuable insights into their problem-solving process. However, there is still significant room for enhancing the reasoning abilities of LLMs. Some studies suggest that the integration of an LLM output verifier can boost reasoning accuracy without necessitating additional model training. In this paper, we follow these studies and introduce a novel graph-based method to further augment the reasoning capabilities of LLMs. We posit that multiple solutions to a reasoning task, generated by an LLM, can be represented as a reasoning graph due to the logical connections between intermediate steps from different reasoning paths. Therefore, we propose the Reasoning Graph Verifier (RGV) to analyze and verify the solutions generated by LLMs. By evaluating these graphs, models can yield more accurate and reliable results.Our experimental results show that our graph-based verification method not only significantly enhances the reasoning abilities of LLMs but also outperforms existing verifier methods in terms of improving these models' reasoning performance.

AutoAM: An End-To-End Neural Model for Automatic and Universal Argument Mining [Paper] [Code]

Lang Cao (Independent Research).
ADMA 2023, The 19th anniversary of the International Conference on Advanced Data Mining and Applications.
Abstract
Argument mining is to analyze argument structure and extract important argument information from unstructured text. An argument mining system can help people automatically gain causal and logical information behind the text. As argumentative corpus gradually increases, like more people begin to argue and debate on social media, argument mining from them is becoming increasingly critical. However, argument mining is still a big challenge in natural language tasks due to its difficulty, and relative techniques are not mature. For example, research on non-tree argument mining needs to be done more. Most works just focus on extracting tree structure argument information. Moreover, current methods cannot accurately describe and capture argument relations and do not predict their types. In this paper, we propose a novel neural model called AutoAM to solve these problems. We first introduce the argument component attention mechanism in our model. It can capture the relevant information between argument components, so our model can better perform argument mining. Our model is a universal end-to-end framework, which can analyze argument structure without constraints like tree structure and complete three subtasks of argument mining in one model. The experiment results show that our model outperforms the existing works on several metrics in two public datasets.

PILOT: Legal Case Outcome Prediction with Case Law

Lang Cao, Zifeng Wang, Cao Xiao, Jimeng Sun.
NAACL 2024 Main Conference, 2024 Conference of the North American Chapter of the Association for Computational Linguistics.
Abstract
Machine learning shows promise in predicting the outcome of legal cases, but most research has concentrated on civil law cases rather than case law systems. We identified two unique challenges in making legal case outcome predictions with case law. First, it is crucial to identify relevant precedent cases that serve as fundamental evidence for judges during decision-making. Second, it is necessary to consider the evolution of legal principles over time, as early cases may adhere to different legal contexts.
In this paper, we proposed a new model named PILOT (PredictIng Legal case OuTcome) for case outcome prediction. It comprises two modules for relevant case retrieval and temporal pattern handling, respectively. To benchmark the performance of existing legal case outcome prediction models, we curated a dataset from a large-scale case law database. We demonstrate the importance of accurately identifying precedent cases and mitigating the temporal shift when making predictions for case law, as our method shows a significant improvement over the prior methods that focus on civil law case outcome predictions.

2022

2021

CBCP: A Method of Causality Extraction from Unstructured Financial Text [Paper] [Code]

Lang Cao, Shihua Zhang and Juxing Chen.
NLPIR 2021, In 2021 5th International Conference on Natural Language Processing and Information Retrieval.
Abstract
Extracting causality information from unstructured natural language text is a challenging problem in natural language processing. However, there are no mature special causality extraction systems. Most people use basic sequence labeling methods, such as BERT-CRF model, to extract causal elements from unstructured text and the results are usually not well. At the same time, there is a large number of causal event relations in the field of finance. If we can extract enormous financial causality, this information will help us better understand the relationships between financial events and build related event evolutionary graphs in the future. In this paper, we propose a causality extraction method for this question, named CBCP (Center word-based BERT-CRF with Pattern extraction), which can directly extract cause elements and effect elements from unstructured text. Compared to BERT-CRF model, our model incorporates the information of center words as prior conditions and performs better in the performance of entity extraction. Moreover, our method combined with pattern can further improve the effect of extracting causality. Then we evaluate our method and compare it to the basic sequence labeling method. We prove that our method performs better than other basic extraction methods on causality extraction tasks in the finance field. At last, we summarize our work and prospect some future work.

Intelligent Cross-sensing Sensor Based on Deep Learning [Paper]

Lingfei Xu, Jiaming Zhang, Lang Cao, and Xinyu Hu.
ICSIP2021, In 2021 6th IEEE International Conference on Signal and Image Processing.
Abstract
Qualitative and quantitative detection of gases is of great importance in industrial automation, environmental protection, chemical control and other fields. Low-cost and high-performance gas sensors have been developed, but single gas sensors have physical defects such as cross-sensitivity. In this paper, based on the principle of electronic olfactory system, we combine gas sensor array with neural network, build a training device and training system for intelligent cross-sensitive sensors, use the trained intelligent cross-sensitive sensor system for gas identification and detection, and develop a supporting client for gas data visualization. The system achieves qualitative identification and quantitative analysis of multiple gases, and fuses BP neural network and RBF neural network to propose a more optimized algorithm model, which improves the accuracy of qualitative identification and the precision of quantitative analysis.

Clustering of Functionally Related Genes Using Machine Learning Techniques [Paper]

Yujing Xue and Lang Cao.
ICCDA 2021, In 2021 5th International Conference on Compute and Data Analysis.
Abstract
The clustering of functionally related genes has been an important task for biologists. With the recent progress of machine learning technology, researchers now have more powerful weapons to identify the structures within a large amount of DNA sequencing data. That allows the research on genes to be conducted in an efficient and scalable way. This paper studies the clustering of functionally related genes and their impact on the development and prognosis of lung cancer with machine learning technologies. The patient data derived from 218 patients are analyzed. We focus on two extreme cases, one case includes patients who survived less than 1 year, and the other case includes patients who survived longer than 5 years. We will investigate how different clustering methods can assist in the visualization of the DNA sequence data of such patients, and how such methods can help us identify the underlying patterns of the DNA sequence data.

Lang Cao

Publications