Avro Fields tab

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15
Avro Input Avro Fields Tab

The table in the Avro Fields tab defines the following properties for the input fields from the Avro source:

Field Property Description
Avro path (Avro type) The location of the Avro source (and its format type).
Indexed values

The index key to use in an Avro path collection. You can use this field for map or array expansion, which expands array or map values to return multiple rows of data.

  • To return map elements, specify an index key.
  • To return array elements, specify the array index number, or use the asterisk wildcard (*) to return all elements of an array.

When this field is left blank, data is not returned for the field.

Name The name of the input field.
Type The type of the input field, such as String or Date.
Format The format of the input field.

The Avro Fields tab also contains the following options for specifying how certain fields behave in this step:

Option Description
Pass through fields from previous step

Specify how fields pass through this step:

  • Select to pass the fields from the previous step along with the fields in the current step to the next step.
  • Clear to not pass these fields to the next step.
Allow null values for missing paths or fields

Specify how missing fields should be replaced:

  • Select to replace missing fields in the incoming data with null values.
  • Clear to not replace missing fields with null values.

After you have provided a path to an Avro data file or Avro schema, click Get Fields to populate the fields.

These fields represent the Avro schema. When the schema field is retrieved, the Avro type is converted to an appropriate PDI type. A user can change the PDI type. Below is the Avro-to-PDI data type conversion table.

Avro Type PDI Type
String String
TimeStamp TimeStamp
Bytes Binary
Decimal BigNumber
Boolean Boolean
Date Date
Long Integer
Double Number
int Integer
float Number
Note: The default format mask for the date type is yyyy-MM-dd. The default format mask for the timestamp type is yyyy-MM-dd HH:mm:ss.SSS. If the data stored is any other format, and was stored as a string data type, it will not be possible to retrieve the column data. In that case, null will be returned for that column.