Introduction To Talend Open Studio (TOS)
Talend Open Studio is an open source project that is based on Eclipse RCP. It supports ETL oriented implementations and is generally provided for the on-premises deployment. It is extensively used for integration between operational systems, ETL processes and data migration. Talend Open Studio for Data Integration is designed in such a way that it can easily combine, convert and update data present at various locations across an organization. This acts as a code generator which produces data transformation scripts and underlying programs in Java. It provides an interactive and user-friendly GUI which lets you access metadata repository containing the definition and configurations for each process performed in Talend. Below is the basic architecture of Talend Open Studio.
TOS Installation – Talend Tutorial
STEP 1: Go to: https://www.talend.com/download.
STEP 2: Click on ‘Download Free Tool’.
STEP 3: Again click on ‘Download Free Tool’ to get the zip file.
STEP 4: Now extract the zip file.
STEP 5: Now go into the extracted folder and double click on TOS_DI-linux-gtk-x86_64 file.
STEP 6: Let the installation finish.
STEP 7: Click on ‘Create a new project’ and specify a meaningful name for your project.
STEP 8: Click on ‘Finish’ to go to the Open Studio GUI.
STEP 9: Right-click on the Welcome tab and select ‘Close’.
STEP 10: Now you should be able to see the TOS main page.
TOS GUI – Talend Tutorial
Now that you have downloaded and installed Talend Open Studio, let me give you a walkthrough of its GUI. Talend Open Studio consists of four major parts, as shown below.
The Repository collects all the technical items which can be used either to describe business models or design Jobs within Talend and displays them in a tree structure. From the Repository, you can access various Business Models, Job Designs, reusable routines, documentation as well as database connections. In other words, the Repository acts as a central store for all the elements which are necessary for any Job design or business modelling within a project.
2. Design Window
This window further consists of the following parts:
- Workspace: Here you can lay down the designs of your Jobs as well as the business models.
- Designer Tab: This tab opens by default when you create a Job which displays the Job in a graphical mode.
- Code Tab: This tab helps you in visualizing the code and highlight the possible language errors.
Component Palette is docked at the top of the design workspace to help you draw the model corresponding to your workflow needs. Depending on your Job or the business model, you can drag and drop various technical components or shapes into your design workspace. There are more than 800 components available for you to choose from.
4. Configuration Tab
The configuration tabs are present in the lower half of the design window. There are various configurational tabs available in TOS. Each of these tabs opens a view which displays the properties of the current element in the workspace. Most frequently used configurational tabs are:
1. Job Tab:
The Job tab provides various information about the current Job in the designer window including name, version, creation date and time etc.
2. Context Tab
The Context tab is used to set context variables and different contexts on which they will be used.
3. Component Tab
The Component tab displays all the parameters that are required to configure a component. Basically, it collects all the information that is relative to the graphical element selected in the design workspace.
4. Run Tab
The Run tab displays the progress of the execution of a Job. The logs shown here includes any start, end and error messages.
Here you might ask ‘what is a Job’, as I have already used this term quite a few times till now. So, before diving any deeper let me first give you a brief about a Talend Job.
Talend Job – Talend Tutorial
A ‘Job’ in Talend is basically a customer requirement converted into a technical process. Technically, it is a basic executable unit of any process that is built using Talend. As you already know, TOS converts everything into Java codes at the backend. In case of Jobs, each Job is converted into a single Java class. Let me show you how you can create a Job in Talend.
- Right-click on the ‘Job Designs’ in the Repository and select ‘Create job’.
- Specify a meaningful name for your Job along with the purpose and description of it and click on ‘Finish’.
- Once you finish creating a Job, you will get access to the components present in the palette. Now you can drag any component you need from the palette and drop it in the workspace.
But in order to add a component to a Job, first, you need to know what exactly are components, how you can use multiple components together and connect them.
- In this job we will insert the data into salesforce objects, first we need to take tFileInputDelimited component and drag it into the workspace and configure the component with the input file.
- We need to take tMap component which is used to map input file data into salesforce object, the tFileInputDelimited and tMap is connected by Right-clicking on tFileInputDelimited.
- Now, double click on tMap component and map the input file data into salesforce object fields.
- After mapping the fields inside tMap, click ok and drag tSalesforceOutput component and connect tMap to tSalesforceOutput by right clicking on tMap component.
- Now goto TsalesforceOutput component and configure the component with login details and module(Object).
- Now save the job and goto Run tab and click on Run.
- Now the data in the input file gets inserted into the object successfully.
- We can make this job to run on server by building it into a .bat file and running it using windows task scheduler. For this Right- click on the job and click on build job.
Metadata – Talend Tutorial
Metadata in Talend is the definitional data which basically provides information
about other data that all are managed within Talend Studio. You can find the Metadata in the Repository area of the TOS. In the Repository Metadata, you can store metadata about the various data sources that you may use. This comes in handy while developing any project as you can use these data sources later in your Jobs, just by dragging an object from the repository and dropping it in the workspace.
In the Repository, you can store metadata for various data sources like delimited files, positional file, XML files, database, FTP, Azure, Salesforce etc.