HCP for cloud scale supports the S3 Select feature.
HCP for cloud scale fully supports the S3 Select Object Content method, which allows S3 client applications to retrieve portions of a structured object. Selecting only the data needed within an object can significantly improve operational costs, retrieval time, and performance.
Objects can be encrypted or compressed with GZIP or BZIP2. Scan ranges are supported for Parquet or uncompressed CSV and JSON objects; this allows you to break down a larger object into non-overlapping sections that can be scanned in parallel.
The portion of the object returned is selected based on structured query language (SQL) sent in the request that includes SELECT, FROM, WHERE and LIMIT.
All data types, operators and SQL Functions are supported including aggregate functions (AVG, COUNT, MAX, MIN and SUM), conditional functions, conversion functions (CAST), date functions and string functions (UPPER, LOWER).
Supported structured object input-serialization formats include:
- Comma-separated values (CSV)
- JavaScript Object Notation (JSON)
- Apache Parquet
Supported structured object output-serialization formats include:
- CSV
- JSON
The client application must have read permission s3:GetObject. The SQL expression can be up to 256 KB and can return up to 5 GB of data with a maximum of 1 MB for a single record.
Here is a simple example of a SQL query against a Parquet object. The query returns data for salaries greater than 100,000:
select salary from s3object s where s.salary > 100000
Implementation Differences | AWS | HCP for cloud scale |
---|---|---|
Case sensitive SQL column queries, such as: select s.id from an object s with columns ID, iD, and id… |
Case insensitive, returning the column ID. | Case sensitive, returning the column id. |
Returned-bytes value in status messages. | Value is the total size of the record, including any delimiters. | Value is the sum of all returned column data. |
CSV output when the record field contains /r or /n. | Record field returned as-is. | Record field returned wrapped in quotes (“ “). |
SQL CASE Expressions | Supported | Not Supported |
SQL CAST to float - decimals of precision | 20 | 40 |
Additional functionality | AWS | HCP for cloud scale |
---|---|---|
Nested aggregate functions, such as: count(sum(s.salary)) |
Not supported | Supported |
Complete set of date-time formats 1 | Partially supported | Supported |
Support Select * without alias | Not supported | Supported |
1For more information, see the Oracle Datetime formatter web page at https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html