Transferring Data To and From File Storage with Lustre

Many common use cases for File Storage with Lustre include the transfer of a large amount of data. Based on the origination, destination, and the direction of the data transfer, the best method to accomplish that transfer can vary.

The following table provides recommendations for common File Storage with Lustre data transfer scenarios.

For general information about private connections between OCI and on-premises data, see FastConnect and Site-to-Site VPN.

Transfer Data From... To... Recommended Method Prerequisites and Considerations
OCI File Storage with Lustre On-premises S3 Object Storage

Use rclone. For more information, see Overview of Rclone and Basic Terms. The same technique can be used in reverse.

Instance should be able to connect to the Object Storage bucket.

OCI File Storage with Lustre On-premises file system (local disk, SAN or NAS) Linux users can use instance-to-instance streaming and the fpsync tool to transfer data from OCI. For some examples, see Transferring On-Premises Data to File Storage with Lustre (Linux). The same technique can be used in reverse. Ensure that network connectivity is established between source instance and destination.

Transferring On-Premises Data to File Storage with Lustre (Linux)

The fpsync tool is a parallel wrapper of rsync. Linux users can download fpsync from a yum repository. The commands differ depending on the version of Linux.

  1. Download from the repository.

    Linux 8 users can download the tool using the following command:

    sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
    

    Linux 9 users can download the tool using the following command:

    sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
    
  2. Install the tool:
    sudo yum install fpart -y

Before beginning the data transfer, complete the following prerequisites:

  • Ensure that network connectivity is established between the on-premises data source and OCI. Use FastConnect or Site-to-Site VPN connection to enable fast instance-to-instance streaming over SSH.
  • Create an Oracle Linux instance in OCI.
  • Attach or mount the on-premises storage share on a Linux server. A dedicated instance is recommended.

Use the fpsync tool to perform an initial copy of on-premises data to OCI File Storage with Lustre. Then, incremental data changes can be synchronized using rsync because fpsync can't delete files and folders in the destination that don't exist in the source.

In this scenario, we suggest that the initial copy uses fpsync. Later, incremental syncs use rsync because fpsync doesn't have the --delete option.

  1. Mounting File Systems From UNIX-Style Instances.
  2. Run the following command from the on-premises Linux server where the source share is attached or mounted to perform the initial copy:
    fpsync -vv -n `nproc` -f 2000 -o "-arxH --progress --log-file fpsync.log -e ssh" /<source>/ <user>@<oci_linux_instance>:/file_storage_destination/
  3. (Optional) If you need to run an incremental sync until a specific date, you can schedule the following rsync command as a cron job:
    rsync -arxH --delete --progress --log-file rsync.log -e ssh" /source/ <user>@<oci_linux_instance>:/file_storage_destination/
For more fpsync options, see the fpsync man page.

Using Instance-to-Instance Streaming to Transfer File Storage with Lustre Data

The fpsync tool is a parallel wrapper of rsync. You can use fpsync and instance-to-instance streaming to transfer data between mounted File Storage with Lustre file systems.

To install fpsync, enable the Oracle Linux developer repository, which includes the fpsync utility, on the OCI instance using a command such as the following. The command differs based on the version of Oracle Linux in use:

yum --enablerepo ol7_developer_EPEL install -y fpart
yum --enablerepo ol8_developer_EPEL install -y fpart

After installing the tool, use an instance-to-instance streaming command such as this to stream data:

fpsync -o "-e ssh --progress" /<src_path>/test <ssh_user>@<remote_ip>:/<dest_path>/

For more information and options, see the fpsync man page.

An example showing the performance difference between the two approaches follows:

# date; time fpsync -o "-e ssh --progress --log-file ~/speedtest.log" /src_path/test/ root@OCI_lfsclient:/lfs_dest_path/ ; date

Sun Mar 13 15:22:58 GMT 2022

real 0m1.467s
user 0m0.111s
sys 0m0.075s
Sun Mar 13 15:23:00 GMT 2022

# ls -ltrd test
drwxr-xr-x. 2 root root 1 Mar 13 15:22 test
# du -sh test
1001M test
# cp -r test test1

# date; time fpsync -o "--progress --log-file ~/speedtest1.log" /src_path/test/ /lfs_dest_path/ ; date
Sun Mar 13 15:25:16 GMT 2022

real 1m28.847s
user 0m3.688s
sys 0m1.439s
Sun Mar 13 15:26:44 GMT 202