Using File Storage Parallel Tools

The Parallel File Tools suite provides parallel versions of tar, rm, and cp. These tools can run requests on large file systems in parallel, maximizing performance for data protection operations.

The toolkit includes:

  • partar: Use this command to create and extract tarballs in parallel.

    The partar tool supports the extraction of tar files created in the GNU basic tar POSIX 1003.1-1990 format. Files created in other archive formats, such as PAX, are not supported.
  • parrm: You can use this command to recursively remove a directory in parallel.
  • parcp: Use this command to recursively copy a directory in parallel.

Installing the Parallel File Tools

The tool suite is distributed as an RPM for Oracle Linux, Red Hat Enterprise Linux, and CentOS.

To install Parallel File Tools on Linux

To install Parallel File Tools on an Oracle Linux instance:

  1. Open a terminal window on the destination instance.
  2. Type the following command:
    sudo yum install -y fss-parallel-tools
To install Parallel File Tools on Oracle Linux 8

To install Parallel File Tools on an Oracle Linux 8 instance:

  1. Open a terminal window on the destination instance.
  2. Install the Oracle Linux developer repository, if needed, by using the following command:
    dnf install oraclelinux-developer-release-el8
  3. Install the Parallel File Tools from the developer repository using the following command:
    dnf --enablerepo=ol8_developer install fss-parallel-tools
To install Parallel File Tools on CentOS and Red Hat 6.x

To install Parallel File Tools on CentOS and Red Hat 6.x:

  1. Open a terminal window on the destination instance.
  2. Type the following command:
    sudo wget -O /etc/yum.repos.d/public-yum-ol6.repo
    sudo wget -O /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
    sudo yum --enablerepo=ol6_developer install fss-parallel-tools
To install Parallel File Tools on CentOS and Red Hat 7.x
  1. Open a terminal window on the destination instance.
  2. Type the following command:
    sudo wget -O /etc/yum.repos.d/public-yum-ol7.repo
    sudo wget -O /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
    sudo yum --enablerepo=ol7_developer install fss-parallel-tools

Using the Tools - Basic Examples

Here are some simple examples of how the different tools are commonly used in Oracle Cloud Infrastructure File Storage.

To copy all files and folders from one directory to another

In this example, parcp is used to copy the directory "folder" in /source to /destination. The -P option is used to set the number of parallel threads you want to use.

$parcp -P 16 /source/folder /destination

In the following example, parcp is used to copy the contents of the directory "folder" in /source to /destination. The "folder" directory itself is not copied.

$parcp -P 16 /source/folder/. /destination
To create a .TAR archive of a directory
The following command creates a .tar archive of the contents of the specified directory, and stores it as a tarball in the directory. In the example below, the name of the directory that is being used to create the tarball is example.
$partar pcf example.tar example -P 16
You can also create a tarball and send it to a different directory. In the example below, the directory being used to create the tarball is example. The tarball is being created in the /test directory.
$partar pcf example.tar example -P 16 -C /test

Using the Tools - Advanced Examples

Here are some examples of how the different tools are used in more advanced scenarios.

To copy selected files or folders into a .TAR archive and exclude others

You can specify which files and folders are included when you create a .tar archive using partar. Let's say you have a directory that looks like this:

[opc@example sourcedir]$ ls -l
total 180
-rw-r-----.  1 opc opc          0 Apr 15 02:55 example2020-04-15_02-55-33_217107549.error
-rw-r-----.  1 opc opc         10 Apr 15 03:18 example2020-04-15_02-55-33_217107549.log
-rw-rw-r--.  1 opc opc         12 Apr 15 03:18 example2020-04-15_03-18-13_267771997.error
-rw-rw-r--.  1 opc opc         10 Apr 15 03:18 example2020-04-15_03-18-13_267771997.log
-rwxr-xr-x.  1 opc opc         37 Nov 30  2017 File1.txt
-rwxr-xr-x.  1 opc opc         15 Dec  1  2017 File2.txt
-rwxr-xr-x.  1 opc opc         39 Nov 30  2017 File3.txt
-rwxr-xr-x.  1 opc opc         57 Dec  1  2017 File4.txt

The following command creates a .tar archive that:

  • Contains a mydir directory named as specified.
  • Includes File1.txt, File2.txt, File3.txt, and File4.txt.
  • Excludes all .log and .error files.
  • Sends the .tar ball from /sourcedir to /mnt/destinationdir
  • Extracts the .tar archive
[opc@example sourcedir]$ sudo partar cf - mydir --exclude '*.log*' --exclude '*.err*' | sudo partar xf - -C /mnt/destinationdir

Performing ls -l on /mnt/destinationdir/mytar shows that only the desired files have been copied.

[opc@example mytar]$ ls -l
total 148
-rwxr-xr-x.  1 opc opc         37 Nov 30  2017 File1.txt
-rwxr-xr-x.  1 opc opc         15 Dec  1  2017 File2.txt
-rwxr-xr-x.  1 opc opc         39 Nov 30  2017 File3.txt
-rwxr-xr-x.  1 opc opc         57 Dec  1  2017 File4.txt

When excluding a directory or file from the archive, provide only the name of the directory or file. The --exclude option does not support use of an absolute path. Using an absolute path in the --exclude option will not exclude the specified directory or files from the .tar archive. For example, if you need to exclude a directory called testing from the path of the source directory, you would specify that in a command like the following:

sudo partar pczf name_of_tar_file.tar.gz /<path_source_directory> --exclude=testing

All files or directories that match the --exclude pattern under the path of the source directory will be excluded from the partar archive.
To copy selected files or folders from one directory to another

You can specify which files and folders are included when you use parcp to copy from one directory to another. Let's say you have a directory that looks like this:

[opc@example sourcedir]$ ls -l
total 180
-rw-r-----.  1 opc opc          0 Apr 15 02:55 example2020-04-15_02-55-33_217107549.error
-rw-r-----.  1 opc opc         10 Apr 15 03:18 example2020-04-15_02-55-33_217107549.log
-rw-rw-r--.  1 opc opc         12 Apr 15 03:18 example2020-04-15_03-18-13_267771997.error
-rw-rw-r--.  1 opc opc         10 Apr 15 03:18 example2020-04-15_03-18-13_267771997.log
-rwxr-xr-x.  1 opc opc         37 Nov 30  2017 File1.txt
-rwxr-xr-x.  1 opc opc         15 Dec  1  2017 File2.txt
-rwxr-xr-x.  1 opc opc         39 Nov 30  2017 File3.txt
-rwxr-xr-x.  1 opc opc         57 Dec  1  2017 File4.txt

First, create a .txt file containing a list of files you want to exclude. In this example, it's /home/opc/list.txt.

The following command copies the contents from sourcedir to /mnt/destinationdir and:

  • Copies File1.txt, File2.txt, and File3.txt.
  • Excludes File4.txt and the .log and .error files, as listed in /home/opc/list.txt .
[opc@example ~]$ cat /home/opc/list.txt
[opc@example ~]$ date; time sudo parcp --exclude-from=/home/opc/list.txt -P 16 --restore /sourcedir /mnt/destinationdir;
date Mon Jun  1 15:58:30 GMT 2020

real 9m55.820s
user 0m3.602s
sys 1m5.441s

Mon Jun  1 16:08:25 GMT 2020
Performing ls -l on /mnt/destinationdir shows that only the desired files have been copied.
[opc@example destinationdir]$ ls -l
total 91
-rwxr-xr-x.  1 opc opc         37 Nov 30  2017 File1.txt
-rwxr-xr-x.  1 opc opc         15 Dec  1  2017 File2.txt
-rwxr-xr-x.  1 opc opc         39 Nov 30  2017 File3.txt
To use PARCP as an effective alternative for RSYNC in parallel

The --restore option in parcp is similar to using the -a -r -x and -H options in rsync. (See rsync(1)- Linux Man Page.) The -P option is used to set the number of parallel threads you want to use.

The restore option includes the following behavior:

  • Recurse into directories
  • Stop at file system boundaries
  • Preserve hard links, symlinks, permissions, modification times, group, owners, and special files such as named sockets and fifo files
$parcp -P 16 --restore /source/folder/ /destination

You can use parcp with the --restore and --delete options to sync files between a source and target folder. This is a good substitute for using rsync in parallel. As files are added or removed from the source directory, you can run this command at regular intervals to add or remove the same files from the destination directory. You can automate syncing by using this command option in a cron job.

sudo parcp -P 32 --restore --delete /source/folder/ /destination