Parallelization#

What Is Parallel Computing?#

Parallel computing is a style of computation that takes advantage of multiple compute resources to concurrently execute a discrete number of tasks. Parallel computing exists in contrast to serial computing, where the same number of tasks would be solved sequentially using a single compute resource. There are other reasons why parallelism might be beneficial for your project but they are outside the purview of this training.

How Do I Parallelize My Code?#

The process needed to parallelize your code comes in two parts: (1) conceptually figuring out if your code can be parallelized at all (and then where and how much), and; (2) communicating to the cluster that you want your code to run in parallel. For the majority of the group’s purposes, your code can be parallelized – and would benefit from parallelization – if you have a repeated process where each iteration is independent of any other. If you have script elements that do depend on previous iterations these could be rewritten for indepedence, but if that’s not possible, you’ve found the extent of how parallel your code can be executed.

Penn State’s ROAR Collab provides several ways to tell the cluster you want to run a job in parallel. Clicking this link will take you to a presentation detailing how to parallelize your job.

I’d Still Like to Learn More About Parallelization#

There are truly a gargantuan number of resources for you to learn more about parallel computing. We’ve compiled some additional resources below: