Customer Case Study
Alexander Murph (who goes by “Murph”) is a fourth-year Ph.D. candidate, working under Jan Hannig, in the Department of Statistics and Operations Research (STOR) at UNC-Chapel Hill. Murph started using Longleaf in early 2020 when beginning work on his doctoral dissertation.
The Client
As part of his doctoral research, Murph develops novel fiducial approaches to classical inferential problems and illustrates their usefulness in context to current methods. He is broadly interested in the intersection between differential geometry and modern statistics, especially from a fiducial perspective.
Broadly speaking, this means investigating what it means to put a probability distribution on a curved space. An example would be attributing the probability of rainfall (i.e., a probability distribution) to different areas on Earth (i.e., the sphere, a curved space).
Murph also works as a visiting graduate student with the Mayo Clinic Kern Center for the Science of Health Care Delivery, where he is tasked with developing statistical methods that are applied to health care data.
The Challenge
The problems that Murph works on typically require implementing numerical algorithms that are computationally intensive. Sometimes he needs access to hundreds of CPUs for hours or days at a time to run Markov chain Monte Carlo (MCMC) simulations that enable him to evaluate the usefulness of a given statistical methodology he is developing. In other cases, depending on the project, Murph may require fewer CPUs but a larger amount of shared memory. It’s important to Murph that any system he uses be flexible enough to accommodate a variety of workflows.
The Solution
Doing simulations at the scale of hundreds of CPUs was simply not feasible on his laptop so Murph had to explore other options. Luckily, his adviser was already familiar with Longleaf and was able to recommend using Longleaf for running their simulation studies.
In addition to receiving that recommendation, Murph found no option was as cost efficient as Longleaf. Also, he was able to reach out to Sandeep Sarangi in Research Computing to get help with using Longleaf. Sarangi already routinely supports STOR faculty and graduate students with cluster computing needs. Murph now uses Research Computing’s Longleaf cluster for multiple research projects.
A recent project involved using a generalized fiducial inference approach to address a very common problem in statistics: the estimation of the mean and covariance matrix in a multivariate normal distribution. As part of this project, Murph said he “needed to run thousands of numerically intensive simulations, each of which took a few days.”
With the help of Research Computing staff, Murph was able to devise a workflow to run at cluster scale on Longleaf in an efficient way.
“Such a simulation study would be difficult without access to a parallel cluster like Longleaf,” Murph said.
Another recent project involved collaborators from the Mayo Clinic, who provided Murph with a very large longitudinal dataset. Murph used Longleaf to develop a Bayesian change point detection system that can be used to detect distributional changes in the dataset. This is helpful for downstream calculations that the Mayo Clinic performs on the dataset, such as determining whether patients leaving the hospital will need palliative care.
“This project was particularly challenging due to the amount of data,” Murph said. Longleaf, however, made working with such a large dataset much easier than it would have been otherwise, he added.
Once again, Murph was able to use Longleaf resources to run large simulation studies to tackle the problem.
The Results
From his research using Longleaf as his primary computational resource, Murph has written multiple papers that he hopes will soon be published.
Whether he’s working on something theoretical, such as developing methods to estimate and do inference for parameters on manifolds, or something applied, like developing statistical models for use in a health care setting, Murph’s research always has a strong computational component.
“Without access to a cluster like Longleaf and the help of Research Computing staff, I would not be as nearly productive a researcher,” Murph said.
Helping UNC-Chapel Hill researchers make the best use of Research Computing’s computational resources is a regular part of Sarangi’s job. But, he said, “helping Murph has been particularly interesting since I’ve also been able to hear about some of the novel statistical methods being developed at UNC.”