Migrate from Big Data Appliance (BDA) or Big Data Cloud Service (BDCS)

Find out how to migrate from Oracle Big Data Appliance or Big Data Cloud Service to Big Data Service

Note

We recommend that even after migrating to OCI, keep your Big Data Appliance or Big Data Cloud Service clusters (in a stopped state) for at least three months as a backup.

Migrating Resources Using WANdisco LiveData Migrator

Ensure that Port 8020 opens at the destination.

For information about WANdisco LiveData Migrator, click here.

To migrate resources using WANdisco LiveData Migrator, follow these steps:

  1. Install LiveData migrator on any edge of the source cluster by running the following commands:
    wget https://wandisco.com/downloads/livedata-migrator.sh
     
    chmod +x livedata-migrator.sh && ./livedata-migrator.sh
     
    service livedata-migrator status
    service hivemigrator status
    service livedata-ui status
  2. After the installation and setup of the LiveData migrator is complete, access the UI and create your user account. The URL of the UI is as follows:
    http://<LDM-Installation-Host.com>:8081
  3. Do the following to migrate data:
    1. Configure source filesystem.
      To add a source filesystem, on your LiveData Migrator dashboard, do the following:
      1. From the Products panel, select the relevant instance.
      2. In the Filesystem Configuration page, click Add source filesystem.
    2. Configure target filesystem.
      To add a target filesystem, on your LiveData Migrator dashboard, do the following:
      1. From the Products panel, select the relevant instance.
      2. In the Filesystem Configuration page, click Add target filesystem.
      3. Select Apache Hadoop for Target as BDS cluster and provide the default filesystem path. Make sure that source and target connect to destination on 8020 port.
    3. Create a path mapping.
      Path mapping enables migrated data to be stored at an equivalent default location on the target. To create path mappings using the UI, follow these steps:
      1. From the Products list on the dashboard, select the LiveData Migrator instance for which you want to create a path mapping.
      2. From the Migrations menu, select Path Mappings.
      3. At the top right of the interface, click the Add New Path button.
    4. Create a migration.
      Migrations transfer existing data from the defined source to a target. To create a new migration from the UI, follow these steps:
      1. Provide a name for the migration.
      2. From your filesystems, select a source and target.
      3. Select the Path on your source filesystem that you want to migrate. Use the folder browser and select the path name you want to migrate. Select the grey folder next to a path name to view its subdirectories.
  4. Migrate the metadata.
    To migrate the metadata, follow these steps:
    1. Connect metastores.
      Hive Migrator, which comes bundled with LiveData Migrator, lets you transfer metadata from a source metastore to target metastores. Connect to metastores by creating local or remote metadata agents.
    2. Create a metadata migration.
      To create a metadata migration, follow these steps:
      1. On the dashboard, select Add a Hive migration.
      2. Provide a name for the migration.
      3. Select the source and target agents.
      4. Create a database pattern and a table pattern based on Hive DDL that matches the databases and tables you want to migrate.

        For example, using test* for the database pattern matches any database name that starts with test, such as test01, test02, test03.

      5. Click Create.

Migrating Resources Using BDR

Migrating Resources Using the Distcp Tool

You can also migrate data and metadata from BDA and import them to the Big Data Service using the Distcp tool. Distcp is an open source tool that can be used to copy large data sets between distributed file systems within and across clusters.

Validating the Migration

To validate the migration, do the following:
  • Verify that you see the same set of hive tables in the target cluster as in the source cluster.
    1. Connect to the hive shell.
      hive
    2. Run the following command to list the tables:
      show tables;