Create/Edit mappings tab

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15

This tab creates or edits a mapping for a given HBase table. A mapping defines metadata about the values that are stored in the table. Since most information is stored as raw bytes in HBase, mapping allows PDI to decode values and execute meaningful comparisons for column-based result set filtering.

Before a value can be written to HBase, you must define to the step which column family the value belongs to and what its type is. You must also specify type information about the key of the table.

The names of fields entering the step must match the aliases of fields defined in the mapping. All incoming fields must have a matching counterpart in the mapping. There may be fewer incoming fields than defined in the mapping. If there are more incoming fields, then an error will occur. One of the incoming fields must match the key defined in the mapping.

This tab operates in a similar manner as the HBase Input step, with the exception that the HBase Output step allows the target HBase table to be created if it does not already exist. Furthermore, the fields coming into the step to define a mapping.

Select a table to populate the Mapping name drop-down box with the names of any mappings that exist for the table. If there are no mappings defined for the selected table, enter the name of a new mapping.

Enter information about the columns in the HBase table that you want to map. Selecting the name of an existing mapping will load the fields defined in that mapping into the fields area of the display.

Alternatively, you can create a new HBase table and mapping for it simultaneously by configuring the fields of the mapping and entering the name of a table that does not exist in the HBase table name drop-down box.
Create/Edit mappings tab

This tab includes the following fields:
Option Definition
HBase table name Select a table from the list of table names. Connection information in the previous tab must be valid and complete for this drop-down list to populate. See the note in Performance considerations for more options.
Get table names (button) Click to retrieve a list of all existing table names, even if they do not have Pentaho mappings. The table names display the namespace, followed by a colon, then the table name. See Namespaces.
Mapping name Names of any mappings that exist for the table. This box is empty when there are no mappings defined for the selected table.
Note: You can define multiple mappings on the same HBase table using different subsets of columns.
# The order of the mapping operation.
Alias The name you want to assign to the HBase table key. This is required for the table key column, but optional for non-key columns.
Key Indicates whether or not the field is the table's key.
Column family The column family in the HBase source table that the field belongs to. Non-key columns must specify a column family and column name.
Column name The name of the column in the HBase table.
Type

Data type of the column. When the key value is set to Y, the following key column values display in the drop-down list:

Key column types are:
  • String
  • Integer
  • UnsignedInteger
  • Long
  • UnsignedLong
  • Date
  • UnsignedDate
  • Binary

When the key value is set to N, the following key column values display in the drop-down list:

Non-key columns types are:
  • String
  • Integer
  • Long
  • Float
  • Double
  • Boolean
  • Date
  • BigNumber
  • Serializable
  • Binary
Indexed values Enter comma-separated data in this field to define values for string columns.
Get incoming fields (button) Retrieves a field list using the given HBase table and mapping names.
Create a tuple template (button) Select to create a mapping template to write tuples to HBase.
Save mapping (button) Saves the mapping. If there is any missing information in the mapping definition, you will be prompted to correct the mapping definition before the mapping is saved.
Delete mapping (button) Deletes the current named mapping in the current named table from the mapping table. Note that this does not delete the actual HBase table.

A valid mapping must define meta data for the key of the source HBase table. The key must have an Alias specified because there is no name given to the key of an HBase table. Non-key columns must specify the Column family that they belong to and the Column name. An Alias is optional. If not supplied, then the column name is used. All fields must have type information supplied.

For keys to sort properly in HBase, you must note the distinction between signed and unsigned numbers. Because of the way that HBase stores integer and long data internally, the sign bit must be flipped before storing the signed number so that positive numbers will sort after negative numbers. Unsigned integer and unsigned long data can be stored directly without inverting the sign.

For keys to sort properly in HBase, you must note the distinction between signed and unsigned numbers. Because of the way that HBase stores integer and long data internally, the sign bit must be flipped before storing the signed number so that positive numbers will sort after negative numbers. Unsigned integer and unsigned long data can be stored directly without inverting the sign.
String columns
May optionally have a set of legal values defined for them by entering comma-separated data into the Indexed values column in the fields table.
Date keys
Can be stored as either signed or unsigned long data types, with epoch-based timestamps. If you have a date key mapped as a String type, PDI can change the type to Date for manipulation in the transformation. No distinction is made between signed and unsigned numbers for the Date type because HBase only sorts on the key.
Boolean values
May be stored in HBase as 0/1 integer/long or as strings (Y/N, yes/no, true/false, T/F).
BigNumber
May be stored as either a serialized BigDecimal object or in string form (that is, a string that can be parsed by BigDecimal's constructor).
Serializable
Any serialized Java object.
Binary
A raw array of bytes.
To speed up the creation of a mapping, you can use the incoming fields to the step as the basis for the mapping. Click Get incoming fields to populate the mapping table with information from the fields entering the step. The Alias and Column name of each mapping are set to the name of an incoming field. The type information are filled in automatically, and the Column family are set to either the name of the first column family defined if the table already exists, or, a default value (Family1), which can be altered by the user to define their own families when the target table is created.
Note: The step does not support adding new column families to an existing table.
Important: The names of fields entering the step are expected to match the aliases of fields defined in the mapping. All incoming fields must have a matching counterpart in the mapping. There may be fewer incoming fields than defined in the mapping, but if there are more incoming fields, then an error is logged. Furthermore, one of the incoming fields must match the key defined in the mapping.