Performing Bulk Data Transformation with RDMS as the Transformation Source

A bulk data transformation (BDT) from an RDMS schema happens in two phases— Extraction and Loading. During the Extraction phase, the Agent on the OS 2200 host extracts the data from the RDMS transformation source and writes it to the files. In the Loading phase, the runtime components from the Windows side read those files, transform the data within the files, and load the data on to the target.

You can perform a BDT from an RDMS schema to a SQL Server, an Oracle database, or a Kafka distributed streaming platform. Before performing the BDT, ensure that AIS Configuration Utility is installed on your Windows machine where the runtime components are installed, and a connection is defined in this utility so that the Runtime Windows components can communicate with the OS 2200 host where the AIS2200 and CS2200 are installed.

If the transformation target is Kafka, then Data Exchange acts as a Kafka Producer. It uses XML Schema Definition (XSD) to provide structure to the Kafka messages. To understand this structure and be able to create Kafka Consumers, you can generate Helper API classes in DDW. For more information on the structure of Kafka messages, refer to Data Exchange Development Workbench Administration and Operations Help.

You must also configure the Producer configurations, as explained later in this section.

The following best practices are recommended:

To start a BDT

  1. Click Manage.

    The hosts associated with the deployed transformations appear in the Source Database Hosts pane.

  2. In the Source Database Hosts pane, expand OS 2200 Hosts, and then expand the required host to view the Enterprise Database Server databases hosted on it.

  3. Select the Application Group from which you want to start the BDT.

  4. Point to Start, and then click Bulk Data Transformation.

    Alternatively, click the arrow next to the database name, and then click Start Transformation(s).

    The Data Exchange Transformation Session Setup appears.

  5. On the Transformations tab, select the transformation(s) you want to run, and then click Next.

    The General Settings tab appears.

  6. On the General Settings tab, enter or change the following information, as appropriate:

    • The Agent TCP Port box displays the default port number (5150) on which the Agent is listening. You must change the TCP port number only if you are not using the default port number.

    • The Runtime Service Host box displays the host name or IP address of the Windows partition where the Runtime Service is installed.

      You must change the value of the Runtime Service host if both the Runtime Service and the Administrative Service are not running on the same Windows host.

    • The Runtime Service TCP Port box displays the default TCP port number (9299) of the Runtime Service that was configured during installation.

    • If you require a secured connection, turn on Secure Connection. This enables you to configure a secure connection for data transfer from Runtime Service to the target database.

      • For Oracle as the target, also provide the Secure Connection Port number. The default secure connection port is 2484.

    • If your target is a Kafka streaming platform, then the Kafka target(s) box displays the list of Kafka targets. Enter the following details for each target:

      • Target(s): This displays the name of the Kafka target(s) in the format <transformation>\<schema name>. Each XSD schema loaded in a transformation is considered as a Kafka target. Loading the same schema twice with different name or dragging the same schema to two different transformations (in DDW) will result in two Kafka targets.

      • Bootstrap Server(s): Enter the bootstrap server name(s) and the port number(s) in the format host1:port1,host2:port2. Only letters, numbers, hyphen(-), underline(_), and period(.) are allowed in the Bootstrap server name.

      • Topic: Topic is a stream of records. Enter the name of the topic. For example, MyTopic.

      • Acks: Select the kind of acknowledgement you require:

        • -1 - Acknowledgements from replicas: The leader will wait for the full set of in-sync replicas to acknowledge the record.

        • 0 - No acknowledgement: The producer will not wait for any acknowledgment from the server.

        • 1 - Acknowledgement from leader: The leader will write the record to its local log and respond without waiting for full acknowledgement from all the followers.

      • Client ID: The client ID is Data Exchange.

      • Secure Connection: To configure the Data Exchange Runtime service host to support a secure connection with the Kafka target host, you can enable Secure Connection.

        When you turn on Secure Connection, the Secure Connection Config window appears. In this window, browse and select the Certificate Authority file. You can also provide additional Certificate Configuration details such as certificate, key, and key password if the Kafka server requires client authentication.

        Notes:

        • If the key file is not encrypted, you can leave the password blank. If the key file is encrypted, you must pair the file with the password.

        • For all the files selected in secure connection for Kafka targets, both the certificate and key files must be in PEM format.

      • Other Producer Config: If you need to provide additional Producer configurations, enter the values here in the format name1:value1, name2:value2. These configurations override the default Producer configurations. For the complete list of Producer configurations refer to Kafka Producer Configurations.

    • The Target Commit Error Handling Policy lists the errors that could occur while committing the records to the target.

      On encountering some of the listed errors, you can only choose to stop the transformation or ignore the error. But there is a possibility to recover from the Deadlock Error by retrying. Hence this error has the option to retry 3 times. This recovery is faster, as the retry is internal to the Runtime Service.

      Select the manner in which you want Data Exchange to respond when the specific error occurs.

  7. Click Next.

    The Transformation Settings tab appears.

  8. On the Transformation Settings tab, enter or change the following information, as appropriate:

    • In the User Name and Password boxes, type the user ID and the password of the OS 2200 host, respectively.

      These credentials are used for accessing the server during the data extraction process.

    • From the Profile Name drop-down list, select the required profile.

      The profile details appear below the drop-down box.

    • In the AIS Connection Name box, type the connection name that was defined in the AIS Configuration Utility.

      The Runtime Windows components connect to the OS 2200 host by using this connection name.

      If you need a secure connection from the OS 2200 host to Runtime Service, enable secure connection in AIS, and then enter the corresponding Connection Name.

    • In the No. of Extraction Thread(s) box, type the number of processing threads you want the OS 2200 Agent to use to extract data from the RDMS transformation source. This value depends on availability of database resources in the Application Group.

    • In the Maximum Degree of Parallelization box, type the maximum number of threads that can run in parallel to perform the BDT. This value depends on the number of CPU cores and the available RAM of the machine where Runtime Service is running.

      Note: To avoid performance issues, this value must not exceed the number of CPU cores of the machine where the Runtime Service resides.

  9. Click Next.

    The Start Transformations tab appears.

  10. Verify all the details, and then click Start.

If the BDT fails to start, the error is displayed on the Bulk Data Transformation Status box. Click the icon in the Details column for more information on the cause and resolution of the error. The transformation starts after you have resolved all the errors.

When the BDT starts, the status of the Bulk Data Transformation on the Summary tab changes to Running. Click Refresh on the Summary tab to view the updated status. Click Monitor on the Summary tab to see the statistical data corresponding to the BDT.