HCP for cloud scale can synchronize the following kinds of data in buckets:
- Object data
- All user metadata (that is, anything that can be returned in the header x-amz-meta-*)
- Tags
- Content-Type system metadata
- Objects that the owner of the source bucket doesn't have permission to read
This diagram illustrates the concept of bucket synchronization.
Limitations on bucket synchronization
Objects that existed before synchronization functions are configured are not synchronized. HCP for cloud scale verifies the rules that are valid at the time an object is synchronized, not at the time the object is ingested.
Encrypted objects, ACLs, and objects that are marked as deleted are also not synchronized.
Most system metadata is not synchronized, specifically:
- Owner ID and Name
- Timestamps (when last modified)
- Metadata returned in x-amz-grant-*
- Metadata returned in x-amz-acl
- Metadata returned in x-amz-storage-class
- Metadata returned in x-amz-replication-status
- Metadata returned in x-amz-server-side-encryption-*
- Metadata returned in x-amz-restore-*
- Metadata returned in x-amz-version-id-*
- Metadata returned in x-amz-website-redirect-location
- Metadata returned in x-amz-object-lock-*
- DELETE Object
- Bulk DELETE Object
- PUT Object ACLs
- PUT Object tagging
- DELETE Object tagging
The bucket sync-from function only supports one rule for the same external SQS queue and external bucket. If a bucket has multiple sync-from rules for the same external queue, objects might not be synchronized. To use multiple rules for an external bucket, use one SQS queue for each rule.
Comparing synchronization to replication
HCP for cloud scale can synchronize with buckets on storage systems outside of AWS. It can apply multiple rules to each new object, so long as the destination buckets are the same. It only synchronizes objects that the owner of the source bucket has permissions to read. Object Tagging events are not synchronized; specifically, tags added to an existing object are not synchronized with a remote bucket.In contrast with AWS replication, HCP for cloud scale does not synchronize the following:
- Access control lists (ACLs)
- Lock retention information
- Objects that are encrypted using Amazon S3 managed keys (SSE-S3) and AWS KMS managed keys (SSE-KMS)
If an object being synchronized has the same name as an object in the target bucket, the result depends on whether the target bucket uses versioning:
- If versioning is used, the old object is kept as an old version.
- If versioning is not used, the old object is replaced by the new object.
HCP for cloud scale buckets always use versioning. The best practice is to use versioning in all target buckets.
Best-effort ordering
HCP for cloud scale guarantees that operations are applied in the order of their arrival (strong consistency). However, synchronizing multiple operations applied in a short period of time to the same object presents the following difficulties:
- In a distributed system, especially when many systems are involved, synchronizing all operations in correct order is complex.
- Even if HCP for cloud scale synchronizes all operations in correct order to an external storage component, that component might not guarantee that the operations are applied with strong consistency.
- For bucket sync-from, the external queue service might not guarantee that messages are provided in correct order. In particular, AWS Simple Queue Service (SQS) does not support first-in, first-out (FIFO) queues for S3 notifications.
Therefore, HCP for cloud scale makes its best effort to synchronize only the latest state of an object, not each version or operation for the object. For example:
- Assume that a client sends three operations to an object and that they are all committed: (1) PUT, (2) PUT, (3) DEL. The latest state of the object is (3) DEL. HCP for cloud scale only synchronizes DEL.
- Assume that a client sends three operations to an object and that they are all committed: (1) PUT, (2) DEL, (3) PUT. The latest state of the object is (3) PUT. HCP for cloud scale only synchronizes (3) PUT.
This approach does not guarantee that the latest state of an object will be in the external storage for all situations, however.