Our client has a collection of data from many clinical trials and needed a repeatable approach to apply data science techniques to stratify patients, identify cohorts of responders and ultimately discover novel biomarkers that could predict future clinical response. The trial data included biological assays of protein levels, clinically validated self assessments, and multiple treatment levels.
We developed a generalizable workflow that starts with data ingestion and then transforms the raw data into more useful features (variables) through various normalizations and standardization. Next we coupled methods to quantify feature importance with modeling in order to identify the most predictive features and create an approach that optimally uses these features to predict patient outcomes. In addition to a predictive model, this process acts as transparent AI, highlighting the biological data and trends responsible for the final result. Finally, this product was designed to be easily generalized across studies with little manual input.