Bios 8366 at VUMC Biostatistics
Course Synopsis
Lectures
Grading and Assignments
Final Project
Textbook and Reading Materials
Software Requirements
Version Control with Git
Bios 8366 students are introduced to a range of modern methods for optimization, machine learning, and probabilistic modeling. In addition to the assignments and in-class examples, it is beneficial for practitioners to gain experience using these techniques in a more realistic setting, using data that are collected for use in real-world biostatistical applications. Hence, 50% of the final grade in Bios 8366 will be determined by students’ performance in a course project.
Students will allocate themselves to groups of 2 or 3. Each group will be given access to a Box.com link containing the project data. Completed projects should be pushed to the repository (without the data) no later than noon on Monday, Dec. 10, 2018. All students in a group will receive the same project grade.
The Vanderbilt hospital data is an extraction of 10,000 subjects from the synthetic derivative, which is broken down into 9 tables in comma-separated values (CSV) format, each linked via a deidentified subject ID (RUID
). These data tables include:
phenotype
BMI
MED
LAB
CPT
ICD9
eGFR
BP
ADT
The objective of the Bios 8366 course project is to employ modeling tools introduced in this course to fit prediction models for patient readmission within 30 days of discharge using synthetic derivative data.
Students may use any of the methods covered in the course syllabus, or any related methods, to develop candidate predictive models. Projects should include at least two alternative approaches, and each approach should be iteratively improved or tuned to yield the most competitive models.
The best projects will:
The project reports should be pushed to GitHub by the project deadline. Final projects should be submitted as one or more iPython notebooks.
Finally, please add the following statement either to the end of the project or in the methods section when discussing the dataset:
The dataset(s) used for the analyses described were obtained from Vanderbilt University Medical Center’s Synthetic Derivative which is supported by institutional funding and by the Vanderbilt CTSA grant ULTR000445 from NCATS/NIH