S3 Select

Content Platform for Cloud Scale Administration Guide

Version
2.6.x
File Size
1945 KB
Audience
anonymous
Part Number
MK-HCPCS008-10

HCP for cloud scale supports the S3 Select feature.

HCP for cloud scale fully supports the S3 Select Object Content method, which allows S3 client applications to retrieve portions of a structured object. Selecting only the data needed within an object can significantly improve operational costs, retrieval time, and performance.

Objects can be encrypted or compressed with GZIP or BZIP2. Scan ranges are supported for Parquet or uncompressed CSV and JSON objects; this allows you to break down a larger object into non-overlapping sections that can be scanned in parallel.

The portion of the object returned is selected based on structured query language (SQL) sent in the request that includes SELECT, FROM, WHERE and LIMIT.

All data types, operators and SQL Functions are supported including aggregate functions (AVG, COUNT, MAX, MIN and SUM), conditional functions, conversion functions (CAST), date functions and string functions (UPPER, LOWER).

Supported structured object input-serialization formats include:

  • Comma-separated values (CSV)
  • JavaScript Object Notation (JSON)
  • Apache Parquet

Supported structured object output-serialization formats include:

  • CSV
  • JSON

The client application must have read permission s3:GetObject. The SQL expression can be up to 256 KB and can return up to 5 GB of data with a maximum of 1 MB for a single record.

Here is a simple example of a SQL query against a Parquet object. The query returns data for salaries greater than 100,000:

select salary from s3object s where s.salary > 100000

Implementation Differences AWS HCP for cloud scale
Case sensitive SQL column queries, such as:

select s.id from an object s with columns ID, iD, and id…

Case insensitive, returning the column ID. Case sensitive, returning the column id.
Returned-bytes value in status messages. Value is the total size of the record, including any delimiters. Value is the sum of all returned column data.
CSV output when the record field contains /r or /n. Record field returned as-is. Record field returned wrapped in quotes (“ “).
SQL CASE Expressions Supported Not Supported
SQL CAST to float - decimals of precision 20 40
Additional functionality AWS HCP for cloud scale
Nested aggregate functions, such as:

count(sum(s.salary))

Not supported Supported
Complete set of date-time formats 1 Partially supported Supported
Support Select * without alias Not supported Supported

1For more information, see the Oracle Datetime formatter web page at https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html