Tuesday 8 October 2013

Java for Talend Tips

Talend provides data, application and business process integration solutions.

Ready for big data environments, Talend’s flexible architecture easily adapts to future IT platforms. Talend’s unified solutions portfolio includes data integration, data quality, master data management, enterprise service bus and business process management.

Talend offers a flexible open source based platform, delivered through an easy-to-use, Eclipse-based graphical environment that provides a comfortable workbench for developers.

Talend is based on Java, and the designs you create generate Java code; but its graphical interface provides great power to the developer with no need to hand code.

However, there are situations in which you cannot avoid adding some customised handling to your Data, mainly related to transformations of your Data Types like Strings, Dates, Integers and so forth.

Those of you who do not have knowledge of Java, please do not be scared to learn a few of the basic Java features shown next that will help you optimise your use of Talend:

When To Use Your Customisations


The main components for which you will find it useful to apply your own customisations are the Mapping components, like tMap or tXMLMap, , and the Java components, like tJava, tJavaFlex or tJavaRow.
  • In the Mapping components you will be able to use expressions and methods within expressions to manipulate the fields and apply transformations for your data: An expression is a construct made up of variables, operators, and method invocations, which are constructed according to the syntax of the language, that evaluates to a single value. This is the very important: an expression evaluates to a single value.

  • In the Java components you will be able to use not only expressions and methods within the expressions, but also assignments and more in-depth functionality like blocks and loops, but this is not very frequently required: Statements are roughly equivalent to sentences in natural languages. A statement forms a complete unit of execution. The following types of expressions can be made into a statement by terminating the expression with a semicolon (;)
    • Assingment expressions
    • Any use of ++ or --
    • Method invocations
    • Object creation expressions

Primitive Data Types


A variable's data type determines the values it may contain, plus the operations that may be performed on it. The eight primitive data types supported by the Java programming language are:
 
byte: An 8-bit signed two's complement integer. Min value: -128, Max value: 127 (inclusive).

short: A 16-bit signed two's complement integer. Min value: -32,768, Max value: 32,767 (inclusive).


int: A 32-bit signed two's complement integer. Min value: -2,147,483,648, Max value: 2,147,483,647 (inclusive). For integral values, this data type is generally the default choice and it will most likely be large enough for the numbers you will use, but if you need a wider range of values, use long instead.
 

long: A 64-bit signed two's complement integer. Min value: -9,223,372,036,854,775,808, Max value: 9,223,372,036,854,775,807 (inclusive). Use this data type when you need a range of values wider than those provided by int.
 

float: A single-precision 32-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but can be found here. This data type should never be used for precise values, such as currency. For that, you will need to use the java.math.BigDecimal class instead.

double: A double-precision 64-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion
, but can be found here. For decimal values, this data type is generally the default choice, but it should never be used for precise values, such as currency.
 

boolean: It has only two possible values: true and false. Use this data type for simple flags that track true/false conditions.

char: A single 16-bit Unicode character. Min value: '\u0000' (or 0), Max value: '\uffff' (or 65,535 inclusive).


Apart from these eight primitive types, the Java programming language also provides special support for character strings via the java.lang.String class. Enclosing your character string within double quotes will automatically create a new String object; for example, String s = "this is a string";. The String class is not technically a primitive data type, but considering the special support given to it by the language, you'll probably tend to think of it as such.

Java Operators


Simple Assignment operator: the most common operator is "=", it assigns the value on its right to the operand on its left: int myNumber = 2;

Arithmetic operators:
  • Additive operator: + (also used for String concatenation)
    int result = 1 + 2; // result is 3 
  • Subtraction operator: -
    int result = 3 - 1; // result is 2
  • Multiplication operator: *
    int result = 2 * 5; // result is10
  • Division operator: /
    int result = 9 / 3; // result is 3
  • Remainder operator: % (divides one operand by another and returns the remainder as its result)
    int result = 10 % 3; // result is 1
Unary operators:
  • Unary plus operator: + (indicates positive value)
    int result = +2; // result is 2
  • Unary minus operator: - (negates an expression)
    int result = +2; // result is 2
    result = -result; // result is -2
  • Increment operator: ++ (increments a value by 1)
    int result = +2; // result is 2
    result = result++; // result is 3
  • Decrement operator: -- (decrements a value by 1)
    int result = +2; // result is 2
    result = result--; // result is 1
  • Logical complement operator: ! (inverts the value of a boolean)
    boolean result = true; // result is true
    result = !result; // result is false
Equality and relational operators:
int a = 2;
int b = 2;

  • Equal to: ==
    boolean result = a == b; // result is true
  • Not equal to: !=
    boolean result = a != b; // result is false
  • Greater than: >
    boolean result = a > b; // result is false
  • Greater than or equal to: >=
    boolean result = a >= b; // result is true
  • Less than: <
    boolean result = a < b; // result is false
  • Less than or equal to: <=
    boolean result = b <= a; // result is true
Conditional operators:
int a = 1;
int b = 2;

  • Conditional AND: &&
    boolean result = (a == 1) && (b == 2); // result is true
    boolean result = (a == 2) && (b == 2); // result is false
    boolean result = (a == 1) && (b == 1); // result is false
    boolean result = (a == 3) && (b == 3); // result is false
  • Conditional OR: ||
    boolean result = (a == 1) || (b == 2); // result is true
    boolean
    result =
    (a == 2) || (b == 1); // result is false
    boolean result = (a == 2) || (b == 2); // result is true
  • Ternary Operator: ? : (this operator should be read as: "If someCondition is true, assign the value of value1 to result. Otherwise, assign the value of value2 to result.")
    int value1 = 1;
    int value2 = 2;
    int result;
    boolean someCondition = true;
    result = someCondition ? value1 : value2; // result is 1
The ternary operator is one of the most useful operators in the case of the Talend customisations as it allows you to include if-then-else processing to your expressions, and this is really helpful when processing your data in the Expression Builder for the Mapping components.

Talend Routines

Talend System Routines

The Talend platform offers some out-of-the-box routines with very useful functionality. You can access their definition from the Repository tab of Eclipse, under the Code section, providing specific methods for the following:
  • DataOperation
  • Mathematical
  • Numeric
  • Relational
  • StringHandling
  • TalendDataGenerator
  • TalendDate
  • TalendString 

All of these routines provide methods to help you calculate trigonometric cosines of expressions, transform a String to upper or lower case, amongst others, but the routine I found most helpful of all of them is the TalendDate one, because it considerably simplifies the transformations of Date fields into Strings and vice-verse, as well as offering methods to retrieve the current date or add time to a specified Date.

- Say you have a Date object "myDate" that represents the 2nd of May of 2006 and you wanted to display it with a String formatted as "dd/MM/yyyy". You can achieve this by using the formatDate method in the TalendDate routine:

     String theDate = TalendDate.formatDate("dd/MM/yyyy",myDate);
     // theDate is assigned the value "02/05/2006"

- If you wanted to build a Date object from a String that represents a date and time like "02/05/2006 13:35" you can use the parseDate method in the TalendDate routine:

     String aDate = "02/05/2006 13:35";
     Date theDate = TalendDate.parseDate("dd/MM/yyyy HH:mm",aDate);
     // theDate is a Date object representing the 2nd of May of 2006 at 13:35

This Java docs for the java.text.SimpleDateFormat class show lots of mask combinations you can use to produce and format your dates and times as required. 

The String Java Class And Its Methods


Out of the number of classes and libraries offered by Java, the most relevant for your Talend customisations is the java.lang.String class, and you will find it very useful because it offers great power to transform and manipulate your data.

Strings are a sequence of characters, and in the Java programming language they are objects. The Java platform provides the java.lang.String class to create and manipulate strings; some of this class's features are shown next:

How to create Strings

The class provides thirteen constructors that let you specify the initial value for the String using different sources, like:

     String greeting = "Hello world!";
     String newGreet = new String("Hello world!");
     String charGreet = new String({'H','e','l','l','o','!'});


How to concatenate Strings

The String class includes a method for concatenating strings, which returns a new string that is the result of adding the second string to the first one; but the + operator is frequently used to perform string concatenations, for example:

     String string1 = "Hello ";
     String
string2 = new String("world!");
     String string3 = string1.concat(string2);
     String string4 = "Hello ".concat("world!");
     String string5 = string1 + "world" + "!";
     StringBuilder string6 = new StringBuilder(string1);
     string6.append(string2);
     // string3, string4, string5 and string6 are "Hello world!"

Any object that is not a String can also be transformed into a String via their toString() method, which returns a representation of the object in the form a String, so they can also be concatenated to other strings.

Your code can benefit from the use of the StringBuilder class, as it doesn't create a new instance of a String every time you concatenate, so your code will be optimised.

 How to compare Strings

A number of methods are available within the String class for comparing strings and portions of strings, which are really useful when you need to check the values for your data; some of them are:

boolean endsWith(String suffix) / boolean startsWith(String prefix): These methods return true if this string ends with or starts with the substring specified as an argument to the method.

     String something = "Hello";
     boolean end = something.endsWith("lo");
     boolean ini = something.startsWith(" He");
     // end is true and ini is false

boolean equals(Object anObject) / boolean equalsIgnoreCase(String aString): The returned value is true if and only if the argument is a String object that represents the same sequence of characters as this object, ignoring the difference in case for equalsIgnoreCase.

     String something = "Hello";
     boolean isEqual = something.equals("Hello");
     boolean isEqualIgnore = something.equalsIgnoreCase("hello");
     // in both cases the result is true

boolean matches(String regex): This method test whether this string matches the specified regular expression, very useful to check formats of strings, like checking for valid emails or certain date formats.
 
     String validEmail = "me@mycompany.com";
     String wrongEmail = "me@my company.com";
     String regex = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})";
     boolean valid = validEmail.matches(regex);
     boolean wrong = wrongEmail.matches(regex);
     // valid is true and wrong is false

How to compare between Numbers and Strings

The Java platform provides a set of Number subclasses that wrap the primitive numeric types to provide further functionality, like Byte, Integer, Double, Float, Long and Short.

Converting Strings to Numbers: These wrapping Number classes include a method called valueOf, that converts a string to an object of that type; they also provide a parseXXXX() method, like parseDouble(), that converts strings to primitive type, instead of the wrapping objet:

     String aNumber = "0.2";
     Float aFloatObject = Float.valueOf(aNumber);
     // valueOf returns a Float object
     float aFloat = Float.parseFloat(aNumber);
     // parseFloat returns a primitive type variable

Converting Numbers to Strings: The String class provides a method called valueOf that performs the transformation of the Number subclasses objects into strings; at the same time, each of the Number subclasses has a toString() method that returns the string value of the object; and again the + operator can be used to concatenate a number and the transformation is automatically handled by Java:

     int anInt = 987;
     String intValueOf = String.valueOf(anInt);
     // intValueOf is "987"

     int anotherInt = 3002;
     double aDouble = 858.48;
     String anIntString = Float.toString(anotherInt);
     String aDoubleString = Double.toString(aNumber);
     // anIntString is "3002" and aDoubleString is "858.48"
  
     int lastInt = 23;
     String intToString = "" + lastInt;
     // intToString is "23" 

Other useful Methods for handling Strings

Here is a list of some other useful methods available in the String class for changing case, finding characters or substrings within a string amongst others:

String trim(): This method returns a copy of the string with leading and trailing white space removed.

String toLowerCase() / String toUpperCase(): These methods return a copy of the string converted to lowercase or uppercase.

String substring(int beginIndx) / String substring(int beginIndx, int endIndx): These methods return a new string that is a substring of this string, from the character in position beginIndx to the end of the original string or from the character in position beginIndx to the character in position endIndx - 1.

int indexOf(String str) / int lastIndexOf(String str): These methods return the index of the first / last occurrence of the string specified as argument if it is found or -1 if not found.

String replace(CharSequence target, CharSequence replacement): This method returns a copy of the string after replacing each substring of this string that matches the literal target sequence with the specified literal replacement sequence.

String replaceAll(String regex, String replacement): This method returns a copy of the string after replacing each substring of this string that matches the given regex with the specified replacement.

int length(): This method returns the number of characters contained in the string object.

This is it for the time being, if you have any suggestions or doubts, please do not hesitate to leave a comment below. Many thanks.

This article has used references from the Java Basics page.