白丝美女被狂躁免费视频网站,500av导航大全精品,yw.193.cnc爆乳尤物未满,97se亚洲综合色区,аⅴ天堂中文在线网官网

Idempotent processing of data streams

專利號
US11178197B2
公開日期
2021-11-16
申請人
Amazon Technologies, Inc.(US WA Seattle)
發(fā)明人
Gaurav D. Ghare
IPC分類
H04L29/06; H04L29/08; G06F16/23
技術領域
data,records,stream,checkpoint,may,partition,processing,or,worker,in
地域: WA WA Seattle

摘要

Idempotent processing of data may be implemented for data records retrieved from a data stream. A data stream may receive data records as input and distribute the ingestion, storage, and processing of the data records amongst one or more partitions of the data stream. Partition metadata may be maintained which includes checkpoint metadata for retrieving, processing, and sending data records in the data stream to a specified destination. When assigned a partition for processing, checkpoint metadata for partition may be accessed to determine whether a pending checkpoint for the partition exists. If not pending checkpoint exists, new data records may be retrieved, processed, and sent from the partition of the data stream to a specified destination. If a checkpoint is pending, then the data records identified by the checkpoint metadata as pending may be retrieved, processed, and sent to the specified destination.

說明書

This application is a continuation of U.S. patent application Ser. No. 14/875,201, filed Oct. 5, 2015, which is hereby incorporated in reference herein in its entirety.

BACKGROUND

As the costs of data storage have declined over the years, and as the ability to interconnect various elements of the computing infrastructure has improved, more and more data pertaining to a wide variety of applications can potentially be collected and analyzed. For example, mobile phones can generate data indicating their locations, the applications being used by the phone users, and so on, at least some of which can be collected and analyzed in order to present customized coupons, advertisements and the like to the users. The analysis of data collected by surveillance cameras may be useful in preventing and/or solving crimes, and data collected from sensors embedded at various location within airplane engines, automobiles or complex machinery may be used for various purposes such as preventive maintenance, improving efficiency and lowering costs.

權利要求

1
What is claimed is:1. A system, comprising:at least one processor; anda memory, storing program instructions that when executed by the at least one processor, cause the at least one processor to implement a stream management service, the stream management service configured to:retrieve, by a first stream processing worker of the stream management service, one or more data records from a partition of a data stream assigned by the stream management service to the stream processing worker, the one or more data records retrieved up to a buffer limit;update, by the first stream processing worker, checkpoint metadata for the partition of the data stream to indicate a pending checkpoint for the data records, the pending checkpoint indicating that the one or more data records are to be processed instead of skipped if a second stream processing worker is assigned to the partition of the data stream by the stream management service to replace the first stream processing worker; andafter sending the one or more data records processed by the first stream processing worker to a destination, update, by the first stream processing worker, the checkpoint metadata to advance the checkpoint as not pending, the not pending checkpoint indicating that the one or more data records are to be skipped if the second stream processing worker is assigned by the stream management service to the partition of the data stream to replace the first stream processing worker.2. The system of claim 1, wherein the update to the checkpoint metadata for the partition of the data stream to indicate the pending checkpoint for the data records includes, in the checkpoint metadata, a location to store the one or more data records in the destination.3. The system of claim 2, wherein the update to the checkpoint metadata to advance the checkpoint as not pending removes, from the checkpoint metadata, the location to store the one or more data records in the destination.4. The system of claim 1, wherein the update to the checkpoint metadata to advance the checkpoint as not pending is performed responsive to receiving an acknowledgement from the destination that the one or more data records are stored.5. The system of claim 1, wherein the buffer limit is specified via a request received via an interface for the stream management service.6. The system of claim 1, wherein the data stream is a managed data stream, wherein the partition is one of a number of partitions dynamically determined for the managed data stream in response to receiving a request to create the managed data stream.7. The system of claim 1, wherein the stream management service is further configured to access, by the first stream processing worker, the checkpoint metadata to identify the one or more data records to process and send to the specified destination.8. A method, comprising:retrieving, by a first stream processing worker of a stream management service, one or more data records from a partition of a data stream assigned by the stream management service to the stream processing worker, the one or more data records retrieved up to a buffer limit;updating, by the first stream processing worker, checkpoint metadata for the partition of the data stream to indicate a pending checkpoint for the data records, the pending checkpoint indicating that the one or more data records are to be processed instead of skipped if a second stream processing worker is assigned to the partition of the data stream by the stream management service to replace the first stream processing worker; andafter sending the one or more data records processed by the first stream processing worker to a destination, updating, by the first stream processing worker, the checkpoint metadata to advance the checkpoint as not pending, the not pending checkpoint indicating that the one or more data records are to be skipped if the second stream processing worker is assigned by the stream management service to the partition of the data stream to replace the first stream processing worker.9. The method of claim 8, wherein updating the checkpoint metadata for the partition of the data stream to indicate the pending checkpoint for the data records includes, in the checkpoint metadata, a location to store the one or more data records in the destination.10. The method of claim 9, wherein the update to the checkpoint metadata to advance the checkpoint as not pending removes, from the checkpoint metadata, the location to store the one or more data records in the destination.11. The method of claim 8, wherein the update to the checkpoint metadata to advance the checkpoint as not pending is performed responsive to receiving an acknowledgement from the destination that the one or more data records are stored.12. The method of claim 8, wherein the buffer limit is specified via a request received via an interface for the stream management service.13. The method of claim 8, wherein the data stream is a managed data stream, wherein the partition is one of a number of partitions dynamically determined for the managed data stream in response to receiving a request to create the managed data stream.14. The method of claim 8, further comprising accessing, by the first stream processing worker, the checkpoint metadata to identify the one or more data records to process and send to the specified destination.15. One or more non-transitory, computer-readable storage media, storing program instructions that when executed on or across one or more computing devices cause the one or more computing devices to implement:retrieving, by a first stream processing worker of a stream management service, one or more data records from a partition of a data stream assigned by the stream management service to the stream processing worker, the one or more data records retrieved up to a buffer limit;updating, by the first stream processing worker, checkpoint metadata for the partition of the data stream to indicate a pending checkpoint for the data records, the pending checkpoint indicating that the one or more data records are to be processed instead of skipped if a second stream processing worker is assigned by the stream management service to the partition of the data stream to replace the first stream processing worker; andafter sending the one or more data records processed by the first stream processing worker to a destination, updating, by the first stream processing worker, the checkpoint metadata to advance the checkpoint as not pending, the not pending checkpoint indicating that the one or more data records are to be skipped if the second stream processing worker is assigned by the stream management service to the partition of the data stream to replace the first stream processing worker.16. The one or more non-transitory, computer-readable storage media of claim 15, wherein updating the checkpoint metadata for the partition of the data stream to indicate the pending checkpoint for the data records includes, in the checkpoint metadata, a location to store the one or more data records in the destination.17. The one or more non-transitory, computer-readable storage media of claim 16, wherein the update to the checkpoint metadata to advance the checkpoint as not pending removes, from the checkpoint metadata, the location to store the one or more data records in the destination.18. The one or more non-transitory, computer-readable storage media of claim 15, wherein the update to the checkpoint metadata to advance the checkpoint as not pending is performed responsive to receiving an acknowledgement from the destination that the one or more data records are stored.19. The one or more non-transitory, computer-readable storage media of claim 15, wherein the data stream is a managed data stream, wherein the partition is one of a number of partitions dynamically determined for the managed data stream in response to receiving a request to create the managed data stream.20. The one or more non-transitory, computer-readable storage media of claim 15, storing further instructions that when executed on or across the one or more computing devices, cause the one or more computing devices to implement accessing, by the first stream processing worker, the checkpoint metadata to identify the one or more data records to process and send to the specified destination.
微信群二維碼
意見反饋