Monday, 18 February 2013

Talend Subjobs: Passing Full Context to tRunJob

Talend is an open source software platform that provides data integration, data management, enterprise application integration and big data software and solutions.

As an open source solution, the platform offers the fully free of charge Talend Open Studio, which is a set of open source products for developing, testing, deploying and administrating data management and application integration projects; all of these delivered through an easy-to-use, Eclipse-based graphical environment that provides a comfortable workbench for developers.

I started using Talend as part of some integration projects I was involved with. It was really worth it to explore the tool and I found it really quick to learn; the palette of components provided allows compatibility with a big number of systems and technologies, such as Oracle, Salesforce, Web Services, XML, etc. And apart from all that, Talend generates Java code, so it is very easy to understand and maintain.

But I found an aspect of Talend that, from my point of view, was not very well explained or easy to understand; and it is with regards to the Contexts and how they are passed when using tRunJob components.

Let's have a look at the following example Talend job in order to have a picture of the problem:

Context Group to be used for the Job

The Context Group that will be used for our example job defines two variables, one for the parent job and another one for the child job; it also defines two environments, Test and Dev, as seen on the next image:


This will allow the sample job to be run in different environments, having different parameter configurations.

Simple Talend job that calls a tRunJob component as a child job

This parent job starts by reading a dummy file, it then runs a tJavaRow component to print the value of the Context it receives and it finally passes the Context to a tRunJob component, that will be responsible for calling the child job.


The next image shows the Context variables that are being used for the parent job; in this case there is only one variable: context.parent
 

In order to view what the value of the context.parent variable is in the parent job, the tJavaRow component will print its value, so the code for this component is as follows:

Finally, the tRunJob component performs a call to a child job where the Context variables will be checked again. The configuration of this component is shown next:



















Simple Talend job that represents the child job

The child job will do another dummy reading of a file and another print of Context variables, its design is


Where the Java code to print the value of the Context variables is:

And the Context configuration for the child job will also consist of just one variable: context.child


Several executions are possible

With this configuration, there are several execution options, and at this point is where the tRunJob configuration dialog becomes a bit confusing:
- On the one hand, the "Context" drop-down menu in the dialog specifies the context that will be passed to the child job

- But on the other hand, the "Transmit whole context" check box suggests that the context used in the parent job will be used in the child job

Under these circumstances, two different executions can be run to analyse what is really happening; these are, running the parent job with the Test context environment and running it with the Dev context environment:

Running the parent job under Dev environment

If you click on the Run tab, you can choose from a drop-down menu the different environments you can use to run your job, in this execution Dev will de used:

The result after clicking on the Run button is:

According to the log printed after this execution, the context has been passed as expected, because the parent job was run with the Dev context, and the child job as well. But let's see what happens with the next execution:

Running the parent job under Test environment

If you click on the Run tab, you can choose Test for this second execution:



The result of the execution after clicking on the Run button is:

As you can see, even if the parent job is using the Test context environment, the child job is using the Dev context, so it looks like the "Transmit whole context" check box option in the tRunJob configuration has been ignored by this execution, and the parent did not pass its context to the child job, so it had to use the Dev context, as specified in the "Context" option in the tRunJob configuration.

It looks like the parent job has no control over the context that is passed to the child job, and this is what I was after, being able to decide, from the parent job, what contexts are passed to the children. But the problem I had is that I misunderstood the meaning of the configuration options for the tRunJob component.

How to resolve this misunderstanding

In order for you to be able to pass the context used in the parent to the children, the context for the parent job must hold all the variables required in the child job. So the sample parent job should be using the following Context configuration:


Given this configuration, the parent job holds all of the context variables that are going to be used by the child job. In this case it is only one variable: context.child, and its value will be set from the parent job; so the context variables that will be passed to the child job are previously set by the context environment used by the parent job, Test or Dev.

The executions after changing the Context settings for the parent job are as follows:

- Using the Dev environment in the parent


- Using the Test environment in the parent

In this situation, the "Transmit whole context" option from the tRunJob configuration dialog makes more sense, and it is compatible with the "Context" option in the same dialog:

- All the context variables from the parent job will be passed to the child job, because this is what the "Transmit whole context" option means

- If the child object requires a context variable that has not been passed from the parent object, then the value specified in the "Context" option will be used for the value of that context variable

4 comments:

  1. Thanks for this post. It was very helpful for me. Did you know that the built-in variable contextStr does not get updated in the child job? I was using the variable to do some controls in my child job and discovered it remains the default context name, even when the other context variables are passed down.

    ReplyDelete
  2. Thanks for this. I saw directly where I went wrong. I have now a full operational job with two levels of subjobs working in development, acceptance and production.

    ReplyDelete
  3. Thanks a ton for the post!!! I too had the same issue, your explanation has made life easy.

    ReplyDelete
  4. Thanks for this post. Just tried the solution to switch between production and test envoronments and it works.

    ReplyDelete