Large language models could be the key to better patient-trial matching

Rice CS Ph.D. student wins AMIA Best Student Paper Award

Rice Computer Science Ph.D. Student Jiayi Yuan was awarded the Best Student Paper Award at the 2023 American Medical Informatics Association (AMIA) symposium for work on AI large language models (LLM) and patient-trial matching.

The paper represents promising advancements in using AI to help match patients in need with relevant clinical trials, such as cancer patients or those living with certain other medical conditions.

Yuan completed this research under the guidance of advisor Xia (Ben) Hu, associate professor of computer science. Other co-authors include Rice CS Ph.D. student Ruixiang Tang and Xiaoqian Jiang, Chair of the Department of Health Data Science and Artificial Intelligence at McWilliams School of Biomedical Informatics at UTHealth Houston. The team used a large language model (LLM) coupled with AI deep learning in their research. Their goal was to try and solve common problems encountered by people looking for medical trials.

The Challenges With Patient-Trial Matching

Those in search of the right clinical trial are often faced with a labyrinthine system, obscure to people not educated in medicine. Trial requirements commonly contain medical jargon that isn’t clarified for non-professionals, meaning people could miss out on trials they qualify for simply because those requirements aren’t clarified.

Applying to trials can also require an excessive time commitment from patients, something they may only sometimes have the energy to do. If they don’t have someone to help, it becomes even more difficult.

Databases exist for patients to search, but the requirements for a trial can vary so widely and be so specific that results get left out without the proper search terms.

Previous attempts at automatic patient matching used either a simple algorithm or a deep learning model by itself to try and match people with the right medical trials, but neither was completely successful.

A simple yes/no algorithm isn’t well-suited to matching people with complex medical histories to trials that can have very specific requirements. The same medical condition can be worded differently depending on the trial, and a basic algorithm programmed to recognize only one word would miss those other wordings, passing over results it should catch.

How AI Could Help

Automatic patient-trial matching has to overcome multiple sets of variables.

“Every trial has inclusion criteria, for example, the patient…should have had a stroke,” said Yuan, “And every trial has exclusion criteria, for example, the patient should not be pregnant.”

A yes/no algorithm can’t necessarily account for multiple sets of possibilities

Deep learning requires vast amounts of data in order to recognize what constitutes a match. Since patient data is rightly protected by the Health Insurance Portability and Accountability Act HIPPA privacy laws, it needs to be handled ethically.

Yuan and his co-authors devised a way to overcome both these problems to hopefully improve automatic patient-trial matching in the future.

Their process is two-fold: use publicly available trial data to feed into a large language model (in this case the latest version of ChatGPT), which generates artificial trial data for deep learning. Then, feed that AI-generated data into a deep learning algorithm hosted on a private server where patient data is kept secret.

Deep learning models need vast amounts of high-quality data to be effective and prevent what’s known as “overfitting” — the AI only being able to adjust to the data it already has because it doesn’t have enough data to adapt to real-world scenarios.

Real trial data written by medical experts is fed to ChatGPT, used to generate more of the same. Information like demographics and doctor’s notes teaches the deep learning algorithm variations in patient data.

This combined approach trains the system on large amounts of data and teaches it how to best match patient information, just as it would need to in a real-world scenario. The embedding space within the deep learning model learns over time what data points should be a better match, eventually placing them closer to one another within the deep learning neural network.

“We can tell the deep learning model that this person and this trial [are] a match and it can gradually learn that they should be closer,” said Yuan, “And this person and this trial are not a match, so they should be further away, so the whole space can be learned.”

Going forward, the goal is to automatically match patients to trials with a medical professional acting as the “human in the loop” to double-check results. Eventually, Yuan and the team hope to improve the deep learning algorithmic model enough that automatic patient-trial matching is much more common.

John Bogna, contributing writer

6100 Main St., Houston, TX 77005-1827|

Mailing Address: P.O. Box 1892, Houston, TX 77251-1892 |

713-348-0000 | Privacy Policy | Web Accessibility | Campus Carry