Skip to main content

Note: Mass Storage Migration Starts October 27, 2020

Overview

FAQs

Appropriate Use

Things to Avoid

Creating and Accessing Space

How Do I Put Files There?

Availability of Accidentally Deleted Files

Long Term Storage of Files

How does mass storage work?

Additional help

Overview

The Mass Storage system (also known as StorNext or /ms) is intended to be used for archiving research data files and storing very large files, files that are too large to fit within an individual researcher’s or a department’s storage space. Storage resources are limited; if you will need to store more than 10 Terabytes of data, you should contact Research Computing group (research@unc.edu) to arrange to purchase tapes for your data.  Inquiries about what the current price of tapes is or any other questions regarding this process should also be emailed to Research Computing at research@unc.edu.  Mass storage is not an appropriate archival or general storage location for sensitive data.

When storing files in mass storage, you should store a tar or zip archive instead of directories with many files. Please read the “Appropriate Use” and “Things to Avoid” sections below for what can and can’t be stored in the mass storage system.

Specifically, do not use mass storage to backup your desktop hard disk. Mass storage is not to be used as a backup location for local disk drives, operating systems, or software. Nor is it to be used for copies of sensitive data files. In general, files that change often or directories with more than a few thousand files in them will cause performance problems and consume tape resources. See the UNC Data Backup Options page for alternative solutions for you.

FAQs

Please read Mass Storage Common FAQs regarding the following questions useful for users new to this service.

  • What is Mass Storage?
  • How do I get access to Mass Storage service?
  • Is there a limit for Mass Storage?
  • Why can’t I open my files put on Mass Storage?

Appropriate Use

The Research Computing Mass Storage system (also known as StorNext or /ms) is intended for archiving large files and tar (or zip) archives of many smaller files. It is not intended to be used as a backup location for disk drives, operating systems, or software. Mass storage should NOT be considered as a solution for archiving or storing sensitive data. In general, files that are changed often or directories with too many files in them will cause performance problems and consume too many tapes. See the UNC Data Backup Options page for alternative solutions for you rather than copying your PC files to mass storage.

We vigorously enforce the appropriate use of the mass storage system. This means that should we find inappropriate use we may deny access to the directory in question, delete the files or directories in question, and, if necessary, disable the userid (Onyen) in question. We will always try to contact the owner of the data before taking any actions. Again, please remember that mass storage is not to be used for storing or archiving sensitive data.

If you are routinely storing large numbers (more than several thousand files at a time) of small files (less than 500M) in mass storage, you should tar archive (or zip archive) those smaller files into one tarball (or zip file) outside of mass storage and then move that tar archive (or zip archive) file to mass storage. You are not required to compress the tar or zip file since the mass storage tape drive hardware will compress your data. Reducing the number of individual small files will help the overall performance of the StorNext Mass Storage system. Please review the more detailed list of “Things to Avoid” below.

Things to Avoid

There are things to avoid in order to make good use of the Mass Storage system.

1. Do not store or archive sensitive data to the Mass Storage system.

2. Do not use Mass Storage to backup your desktop hard disk.

Mass storage is not intended to be used as a backup location for local disk drives, operating systems, or software.  In general, files that change often or directories with more than a thousand files will cause performance problems and consume tape resources. See the UNC Data Backup Options page for alternative solutions to copying your PC files to mass storage.

3. Do not create >1000 files in a single directory. Mass Storage scans its directory space several times every hour. The time to scan a directory is exponentially linked to the number of files and directories within it. Since scanning is single threaded, one large directory can slow down the entire system. Similarly, creating one large output file is preferable to several smaller ones.

4. Do store a tar or zip archive in Mass Storage space, but please create the archive outside of Mass Storage.  It is very advantageous to our administering of Mass Storage for people to put tar or zip archives in Mass Storage instead of directories with many small files.

5. Do not run the tar command on directories or files in Mass Storage.  Again, create the tar archive outside Mass Storage. E.g. in your home directory, project space or scratch space.

6. Do not compress files which already exist in Mass Storage space.  If a file is not in Mass Storage, you can compress it first (although this is not necessary) outside of Mass Storage space and then move the compressed file into Mass Storage.  Note that you gain nothing by putting compressed files in Mass Storage space, because all files in Mass Storage are compressed when written to tape.

7. Do not modify frequently any file for consecutive days. Mass Storage only copies to tape files which have not been modified for a few hours.

8. As a corollary to the last item, do not put in Mass Storage files that will be frequently modified.

9. Do not write directly into Mass Storage space, such as your ms directories.  Instead, have applications create/modify files in your home space, project space or scratch space, and then as a final step, archive the final file to Mass Storage using mv to your ms directory.

10. Do not execute long-running programs when your current working directory is a Mass Storage directory.

11. Do not execute a program if the executable file is in Mass Storage space.

12. Do not execute long-running programs that open files in Mass Storage space.

13. Do not use Mass Storage as a scratch space. If you are going to create a number of files which you do not want to keep permanently, use scratch space.  Some programs, such as Gaussian, create huge temporary files which do not always get deleted. Such files are a large waste of tape resources and make jobs run longer.

14. Do not write to Mass Storage directly when creating a dataset in SAS. Instead, write your dataset to scratch space, then copy it to mass storage as a last step, once you have finished modifying it.

15. It is preferred that you not to copy symbolic links and empty files (i.e. batch jobs *.err files) into mass storage as these do not get archived to tape.

Creating and Accessing Space

Only faculty, staff, and graduate students may subscribe to the Mass Storage tape archival service. Be sure to read and understand the intended use of the Mass Storage system, the Appropriate Use Policy and Things to Avoid before subscribing to this service. Once you have a subscription on one of the UNC Research Computing clusters, you have access to Mass Storage too. To subscribe to a service, follow the steps listed to Request a Cluster Account. When you receive the email welcoming you to the Research Computing cluster, you can connect to the cluster and use your mass storage directory via the ms/ link from your home directory.

Note that storage resources are finite; we cannot store an unlimited amount of data. If you will need to store more than 1 Terabytes of data, you can contact Research Computing group (research@unc.edu) to arrange to purchase tapes. In this case, please contact Research Computing staff at least two months in advance to discuss tape purchases as well as to discuss performance issues and anticipated usage patterns.

In some circumstances, it may be useful for departments to have mass storage space where collaborators and project teams can store data shared by all members of the group. To request departmental mass storage space, open a Help Request ticket. Include the following in the Help Request ticket:

  • Directory name
  • Primary and second name contacts
  • Linux group name
  • Onyens accounts to be added to the Linux group
  • Amount of data you need to store

If you are currently doing any large moves or copies of data (onto or from mass storage) while on the Research Clusters, do not issue the copy command from the command line. Use one of the following two options:

  • SLURM’s sbatch: From the command line, submit a job to copy/move the files using the “ms” partition. The “ms” partition is a special area with very good connectivity to the mass storage system. It is optimized to handle multiple data moves well. Here is a sample sbatch command:

     sbatch -p ms --wrap="cp /nas/longleaf/home/<ONYEN>/<FILENAME> /ms/home/<O>/<N>/<ONYEN>/"
  • GLOBUS: GLOBUS is an easy-to-use and sophisticated file moving service. It has a web interface that walks users through their directory structure paths and initiates an efficient data moving operation with one click. It automatically handles interruptions in data transfers and emails you when the copy is complete. See Getting Started with Globus Connect for more information about this service.

How Do I Put Files There?

Files can be moved in and out of mass storage by using simple Linux commands such as “cp” and “mv”. As the Mass Storage system is optimized for archiving data, your programs should not directly read or write from the Mass Storage system. Instead, copy your data from ~/ms to your scratch space.

Mass storage should not be used to store or archive sensitive data.

For large files or if you are off campus or the file is outside of UNC’s campus, the easiest way to store files in Mass Storage is to use the web tool Globus Connect.

An “sftp” a secure implementation may also be used for small files. Simply “sftp” to a Research Computing host and login with your Onyen and Onyen password. If you are using a command-line implementation of “sftp”, you will then change directories to your ms subdirectory:

cd  ms

Then use the “put”command to copy the files you want to store from your computer to that directory in mass storage. Smaller files can also be moved using GUI “sftp” program such as SSH Secure Shell, connect to Research Computing host then use your Onyen and Onyen password to login.  In the Remote Site window, navigate to your “ms” subdirectory. You can then drag files from your local system to the remote site window.

Remember that mass storage is intended to serve as an archive for important, non-sensitive data files and work that you need to keep long-term. It may not be used to backup your desktop systems. See the UNC Data Backup Options page for alternative solutions for that purpose.

Availability of Accidentally Deleted Files

On an ordinary Linux file system, a file’s inodes and data are removed upon deletion. In Mass Storage, the inodes are removed; however, the data remains on tape until a “recycle” process is run. This means that any file which has been accidentally deleted, but which has had its inodes backed up and data written to tape, is potentially retrievable. Alternately, should you create a file then delete it prior to the backups, the file is lost. If the machine hosting Mass Storage crashes before the up-to-date data is copied to tape, you may lose that day’s work.

To obtain assistance with recovery of a deleted Mass Storage file open a Help Request ticket.

Recycling is done as needed. The need is determined by highwater marks in the Mass Storage software by available tape space and the number of slots available in the IBM tape library available to hold tapes. We can do a recycle automatically without notification.

Long Term Storage of Files

We have chosen to implement StorNext as our mass storage software. We believe that any non-deleted file that is on Mass Storage tape will be retrievable for as long as the tape remains readable. Due to the unknown longevity of tape media, we can not guarantee how long tapes will remain readable. However, we have taken steps to insure, as much as possible, that files that have not been deleted will be available for as long as we own the tapes. The tapes we currently own have a shelf life of 10 years but we attempt to migrate onto new tape technology for better compression of data per tape. We keep two copies of every file, we keep the tapes in a climate-controlled room, and we periodically move the second copies to a separate storage facility. It is also possible that files will be corrupted when written to tape (both copies) and such files may not be recoverable. Because data are stored in a non-proprietary format, and encryption is not used when writing tapes, you should NOT use mass storage to archive or store any sensitive data.

How does mass storage work?

The mass storage system uses the StorNext software product to manage storage resources. StorNext is similar to an ordinary disk file system in that it keeps an inode (for recording data location, etc.) and data blocks for each file. For the user of mass storage, this file system appears to be a subdirectory of the user’s home directory. Files can be moved in and out of mass storage by using simple Linux commands such as “cp” and “mv” or by using “sftp” or “scp”. Note that your mass storage directory “ms” cannot be accessed directly by using a Windows/MacOS share mount; instead, you must be logged in to a Research Computing server which has mass storage mounted.

StorNext is different from ordinary disk systems in that it keeps data blocks on tapes, while the inode information remains on disk. When a file is created, an inode is immediately created and the data goes to the StorNext disk cache. If the file stays unmodified for a few hours, it will be copied from disk cache to tape. The tape drive hardware compresses the data as it is written to tape. StorNext copies the data to two different tapes to ensure that we have a backup copy of every file. One tape is always on-site and one tape is stored off-site in a secure location. If problems are encountered when reading data from the on-site tape, we can still retrieve your data but it may take several work days to recall the second tape copy from off-site storage. Please remember that mass storage, should NOT be used to store sensitive data.

When the StorNext disk cache is 90% full, StorNext automatically does a release: it releases the data blocks of files that have already been written to tape until the disk cache is only 70% full. When a file that has been released is accessed it will take at least one minute for the data to be staged or brought back from tape to the disk cache.
Every 24 hours, the inodes are backed up to a location other than StorNext space. Therefore, if we experience a system problem related to StorNext, we can restore the inode table from backup. This will restore every file that was written to tape (archived) and had its inode backed up. This means the file must have been unmodified for a minimum of 24 hours to a maximum of 48 hours.

We monitor the use of mass storage and will inform you if you are using it inappropriately or if you need to purchase tapes to accommodate the volume of data that you need to store.

Additional help

Research Computing home page