At (3), the service host 106 sends a request for tokens for Key 1 to the global cache 114 in response to a determination that the token bucket corresponding to Key 1 is not in a throttled state. This may be based on a determination that the service host has not already requested tokens from the global cache 114 during the current interval, and/or that it has not requested tokens and received a no-token response. Illustratively, the number of tokens the service host requests may be a fixed number of tokens, a fraction of the maximum tokens in global bucket 406, a number of tokens calculated based on a weighted average of previous requests as described above, etc. Alternatively, the request may not specify a number of tokens. For example, the service host may not have previously received a request for Key 1, and may request a fixed number of tokens reflecting the number of requests typical for any given key per interval. Alternatively, the service host may issue a request for more tokens without specifying a fixed number, and rely on the global cache 114 to determine the number of tokens to dispense, for example a fraction of the maximum tokens in the global token bucket corresponding to Key 1. If the service host 106 has history for Key 1, it may base the number of tokens requested on the allowed requests in the corresponding local token bucket in order to minimize the number of requests it needs to make for more tokens without running out of tokens in the current interval.
At (4), the global cache 114 refills global token bucket 406. Illustratively, the global cache 114 may track the time of the last refill via a refill timestamp, and determine on receipt of a token request that global bucket 406 is empty and that it has not been refilled within the current interval.