Systems and methods are provided for efficient handling of user requests to access shared resources in a distributed system, which handling may include throttling access to resources on a per-resource basis. A distributed load-balancing system can be logically represented as a hierarchical token bucket cache, where a global cache contains token buckets corresponding to individual resources whose tokens can be dispensed to service hosts each maintaining a local cache with token buckets that limit the servicing of requests to access those resources. Local and global caches can be implemented with a variant of a lazy token bucket algorithm to enable limiting the amount of communication required to manage cache state. High granularity of resource management can thus enable increased throttle limits on user accounts without risking overutilization of individual resources.
FIG. 6 depicts a general architecture of a computing device or system providing user request handling in accordance with aspects of the present disclosure.