What’s good for people can also be good for computers. Just as we humans sometimes break a big task into smaller, less burdensome components, Research Computing at Information Technology Services has designed a way to ease the stress on ITS’ KillDevil computing system.
This particular improvement is with MatLab, a numerical computing environment and high-level programming language for doing numerical computing and visualization. The software is widely used at UNC-Chapel Hill across a broad spectrum of users, including researchers in life sciences, social sciences and the business school.
Typically MatLab runs on one compute core at a time—that is to say, serially. This new capability, however, enables a single MatLab job to run on many cores at once and they are not restricted to being on the same node. In other words, they can do distributed parallel computing, said Mark Reed, Research Associate at Research Computing.
This is a new capability that enables researchers to run on more processors, which gives them more compute power. It also enables more flexible scheduling on the KillDevil cluster, which improves throughput, he said. The result is that researchers can get more work done in a shorter period of time.
To put it a little more simply, users can divide their projects into portions and submit them in a way that the work inside the program is divided up across multiple processors. Instead of waiting until they can take over a whole node within the cluster, they can break up their job into say, 32 or 64 components, thereby reducing delays. In turn, this loosens up the wait time for everyone needing time on KillDevil, said Jenny Williams, a systems analyst with Research Computing.
This new capability enables MatLab users who run many projects on the research cluster to gain flexibility in scheduling.
Carolina approach serves as model
Another university has now contacted ITS Research Computing seeking guidance in enabling its MatLab users to likewise portion out projects on its research cluster.
New capability debuted last fall
ITS Research Computing introduced this new capability during the Fall 2014 semester. Not many jobs lend themselves to this kind of segmenting of work. MatLab, a program used for math and statistical analysis, is one of the most frequently used applications that can split up a project into components.
UNC-Chapel Hill graduate students are the primary users of KillDevil. The cluster, which is halfway through its lifecycle, is stretched thin at times. More and more applications are assuming they can take over a whole node within a cluster, but that creates delays for other users.
Scheduling time improves
Before there were artificially long wait times as users waited for an entire machine within KillDevil, Williams said. And depending upon how users submitted a project, the work could overwhelm the machine. Now, though, some MatLab users can slice up their projects and let the MatLab application handle submitting and aggregating the results.
Getting MatLab to work in this manner was a challenge, Williams said. The vendor, MathWorks, had designed MatLab in a way that was different from how Research Computing needed it to function. Research Computing was able to work with the vendor to adjust the functionality to meet the needs of the University. The new MatLab feature that Research Computing licensed from MathWorks is called the Distributed Computing Server (DCS).