Another benefit of this method is that it also allows the service hosts to tailor requests for more tokens according to their utilization, without tying up unused tokens in their local token buckets. By requesting tokens only when a key is requested for which the service host has an empty bucket, the service host ensures that it does not sit idle with tokens that could be used by other services. Alternatively or additionally, the service hosts may tailor their requests to the global token bucket according to their utilization. If a service host receives a request for a given key, it may compare the number of requests against the elapsed time to estimate the number of tokens it will require to service all of the client requests for the current interval, and request the estimated number of tokens from the global token bucket. This has the benefit of reducing the number of requests from service hosts to the global token bucket, thereby reducing network overhead and the computational burden on the global token bucket. In some embodiments, it further allows a service host to predictively assess whether it will run out of tokens for a given key, and send a preemptive request to the global cache for additional tokens. Further, in some embodiments, the service host may return tokens to the global bucket. Illustratively, return may be based on a determination at the service host that the returned tokens are unlikely to be used during a current period. For example, a service host may request a number of tokens in order to service requests corresponding to a user, and the user may subsequently disconnect, fail authentication, or otherwise indicate that future legitimate requests via the present connection are unlikely. In order to prevent the tokens from sitting unused, the service may send a request for a negative number of tokens to return those tokens to the global token bucket. A service host may further determine a number of tokens to return based on request velocity. For example, a service host may have requested 100 tokens based on requests received in the previous interval, but may receive only 5 requests in the first quarter of the current interval. The service host may then determine based on the current request velocity that it will only need 15 more tokens in the current interval, and return the remaining 80 tokens to the global token bucket. To forecast the required number of tokens, the token bucket may use a linear projection or any other forecast via a variety of known forecasting algorithms.