Go to content ALT+c

Storage and I/O in Large Scale Scientific Projects  (@HPCS2015, June 2015) (Old site; new site is at https://scinet.courses)

Wednesday May 15, 2024 - 02:00
  • Export Content

1 Description

Instructors: Ramses van Zon and Marcelo Ponce


In this era of ever increasing amounts of data, one needs to carefully plan and design storage management and I/O patterns in data-driven projects in order to prevent bottlenecks. Real use cases from various fields (bioinformatics, molecular biophysics, medical phyics, biogeochemistry, quantum-chemistry, etc.) have shown that, quite often, approaches that worked fine on a desktop would not perform on a larger scale. This can be due, for example, to many processes contending for file system resources, to the shear number of files, or to the frequency at which files are accessed. In hands-on exercises, we will take laptop-sized mockups of a few of these use cases, investigate where and why bottlenecks arise, and then try to remedy them. A number of tools and techniques will be introduced in the process, including tar, compression, ramdisk, file format options, and job scheduling techniques. We'll conclude with some guidelines how to properly (re)design the storage and I/O of your projects.


Requirements:
Linux command-line knowledge. Some bash scripting experience.
Bring a laptop with a linux-like environment (e.g. Cygwin for Windows users) and a few GB free on the hard drive.

Last Modified: Tuesday Jun 16, 2015 - 08:07. Revision: 1. Release Date: Tuesday Jun 16, 2015 - 08:00.


Content Navigation


Course Calendar


Course Events



Questions? Contact Support.
Web site engine's code is copyright © ATutor®.
Modifications and code of added modules are copyright of SciNet.