Replicating Data to Hadoop

Syniti Data Replication allows you to replicate data from relational database tables to the Hadoop Distributive File System (HDFS) using Refresh or Snapshot, replication: a one-time complete replication from any major relational database source to HDFS as a target, according to replication settings and scripts. You can control the timing of the replication, identify the columns to be replicated and add scripts to transform data during replication. For more specific information about replicating to Hadoop, see the Setup Guide for Hadoop, available from the Help Center.

To set up a target connection for Hadoop:

  1. In the Metadata Explorer, select the Targets node.

  2. From the right mouse button menu, choose Add New Connection.

  3. In the Target Connection WizardProviderfield, select theHadoop HDFSoption.

  4. In the Set Connection String page, set the following properties:

    Output Folder

    An existing folder on the system that is running Syniti DR for files associated with replications to Hadoop

    Hostname

    The server name for the system running Hadoop

    Username

    The user name for the Hadoop instance

    Password_KeyFile

    Either a password or more typically a key file (.ppk extension)

    Path to Binary

    The pathname to the Hadoop executable, including "hadoop". For example, "/home/ubuntu/hadoop-2.7.7/bin/hadoop". To locate the executable, you can run the command "which hadoop" from your SSH session.

    Target Directory

    HDFS directory where files will be uploaded

    Working Directory

    Temporary server directory where files will be stored before moving them to HDFS. Files are managed by Syniti DR.

    Output Folder Archive

    Optional. Provides a local copy of data replicated to Hadoop. Data is not managed by Syniti DR, so the files must be managed manually and could grow quickly.

    Add Transactional Info

    Set to Yes if performing mirroring replications

  5. Click Next to view the Select Tables screen.
    If this is the first time you have created a connection using the output folder defined above, the table display will be empty. However, as the folder is populated with output files, those fie structures are displayed as tables in this view. You can then select one or more.

  6. Click Next to display the Actions screen,

  7. Optionally choose to continue with creating replications once the wizard is complete.

  8. Click Next to display the summary, then click Finish to create the connection.

The next step is to set up replications from whichever source connection you have defined to the file target for HADOOP.