Replicating Data to Hadoop
Syniti Replicate allows you to replicate data from relational database tables to the Hadoop Distributive File System (HDFS) using Refresh or Snapshot, replication: a one-time complete replication from any major relational database source to HDFS as a target, according to replication settings and scripts. You can control the timing of the replication, identify the columns to be replicated and add scripts to transform data during replication. For more specific information about replicating to Hadoop, see the Setup Guide for Hadoop, available from the Help Center.
To set up a target connection for Hadoop:
-
In the Metadata Explorer, select the Targets node.
-
From the right mouse button menu, choose Add New Connection.
-
In the Target Connection Wizard Provider field, select the Hadoop HDFS option.
-
In the Set Connection String page, set the following properties:
Output Folder
An existing folder on the system that is running Syniti Replicate for files associated with replications to Hadoop
Hostname
The server name for the system running Hadoop
Username
The user name for the Hadoop instance
Password_KeyFile
Either a password or more typically a key file (.ppk extension)
Path to Binary
The pathname to the Hadoop executable, including "hadoop". For example, "/home/ubuntu/hadoop-2.7.7/bin/hadoop". To locate the executable, you can run the command "which hadoop" from your SSH session.
Target Directory
HDFS directory where files will be uploaded
Working Directory
Temporary server directory where files will be stored before moving them to HDFS. Files are managed by Syniti Replicate.
Output Folder Archive
Optional. Provides a local copy of data replicated to Hadoop. Data is not managed by Syniti Replicate, so the files must be managed manually and could grow quickly.
Add Transactional Info
Set to Yes if performing mirroring replications
-
Click Next to view the Select Tables screen.
If this is the first time you have created a connection using the output folder defined above, the table display will be empty. However, as the folder is populated with output files, those fie structures are displayed as tables in this view. You can then select one or more. -
Click Next to display the Actions screen,
-
Optionally choose to continue with creating replications once the wizard is complete.
-
Click Next to display the summary, then click Finish to create the connection.
The next step is to set up replications from whichever source connection you have defined to the file target for Hadoop.