What is claimed is:1. A hierarchical token bucket system for load balancing access to a network-accessible service provided by a plurality of service hosts, the system comprising:the plurality of service hosts, each of the plurality of service hosts providing access to the network-accessible service; anda global token bucket cache comprising a plurality of global token buckets, wherein each global token bucket corresponds to a throttle key of a plurality of throttle keys and identifies a number of available tokens for the throttle key within the global token bucket, wherein the global token bucket cache is configured to:receive, from an individual service host of the plurality of service hosts, a request for a number of tokens, the request comprising a throttle key identifying an individual global token bucket of the plurality of global token buckets;when the number of available tokens in the individual global token bucket is greater than zero, dispense a number of tokens up to the number requested from the individual token bucket to the individual service host; andwhen the number of available tokens in the individual token bucket is zero, notify the individual service host that insufficient tokens exist within the individual global token bucket;wherein the global token bucket cache is further configured to, at each interval of a set of intervals, refill each global token bucket with an additional number of tokens;wherein each service host of the plurality of service hosts maintains a local token bucket cache comprising a plurality of local token buckets, wherein each local token bucket corresponds to a throttle key of the plurality of throttle keys and identifies a number of available tokens for the throttle key within the local token bucket, and wherein each service host is configured to:receive an access request from a client requesting to access the network-accessible service;determine a throttle key for the access request;identify an individual local token bucket corresponding to the throttle key for the access request;determine the number of available tokens in the individual local token bucket;when the number of available tokens in the individual local token bucket is sufficient to satisfy the access request, process the access request using at least one available token in the individual local token bucket;when the number of available tokens in the individual local token bucket is insufficient to satisfy the access request:transmit a request to the global token bucket cache for additional tokens associated with the throttle key for the access request;when the request to the global token bucket cache for additional tokens results in dispensing of a sufficient number of the additional tokens to satisfy the access request, store the additional tokens in the individual local token bucket and process the access request using at least one available token in the individual local token bucket; andwhen the request to the global token bucket cache for additional tokens results in dispensing of an insufficient number of the additional tokens to satisfy the access request, throttle the access request.2. The hierarchical token bucket system of claim 1 wherein the service host is further configured to, when the number of available tokens in the individual local token bucket is insufficient to satisfy the access request:query a cache to determine whether the individual local token bucket is contained in the cache;when the individual local token bucket is contained in the cache, throttle the request;when the individual local token bucket is not contained in the cache, add the individual local token bucket to the cache.3. The hierarchical token bucket system of claim 1, wherein throttling a request causes the service host to throttle subsequent requests until a predetermined interval has elapsed.4. The hierarchical token bucket system of claim 1, wherein the service host is configured to forward the access request to the network-accessible service.5. A computer-implemented method for load balancing access to a network-accessible service provided by a plurality of service hosts, the computer-implemented method comprising:receiving, by a service host of the plurality of service hosts, an access request from a client to access the network-accessible service;determining a throttle key for the access request;identifying an individual local token bucket corresponding to the throttle key for the access request;determining a number of available tokens in the individual local token bucket;responsive to a determination that the number of tokens in the individual local token bucket is insufficient to satisfy the access request, transmitting to a global cache a request to dispense additional tokens corresponding to the throttle key from a global token bucket for the throttle key, wherein the global cache is configured to refill the global token bucket with an additional number of tokens at each interval of a set of intervals and respond to requests to dispense additional tokens from the global token bucket by dispensing tokens to a requesting service host from the global token bucket when a number of available tokens in the global token bucket is greater than a threshold number and by notifying the requesting service host that insufficient tokens exist within the global token bucket when the number of available tokens in the global token bucket less than the threshold number;obtaining, from the global token bucket for the throttle key maintained at the global cache, a sufficient number of additional tokens to satisfy the access request; andservicing the access request using at least the additional tokens.6. The computer-implemented method of claim 5, further comprising, at the global cache, refilling the global token bucket associated with the throttle key responsive to the request to dispense additional tokens.7. The computer-implemented method of claim 6, further comprising, prior to refilling the global token bucket, determining at the global cache that the number of tokens contained in the global token bucket is insufficient to dispense the additional tokens.8. The computer-implemented method of claim 6, wherein the number of tokens added to the global token bucket when it is refilled is less than a maximum number of tokens the global token bucket can contain.9. The computer-implemented method of claim 6, wherein a number of tokens added to the global token bucket during refilling is calculated by multiplying a refill rate of the global token bucket and an amount of time since the last refill, and wherein the number of tokens added cannot cause the number of tokens to exceed a predetermined maximum.10. The computer-implemented of claim 5, wherein the request to dispense additional tokens requests a number of additional tokens as a proportion of a maximum number tokens that can be held in the global token bucket for the throttle key.11. The computer-implemented method of claim 5, wherein transmitting a request to dispense additional tokens corresponding to the throttle key further comprises determining a number of tokens calculated as a weighted average of the number of requests received over a plurality of recent intervals, wherein the calculation comprises a polynomial of degree greater than or equal to one.12. The computer-implemented method of claim 11, wherein transmitting a request to dispense additional tokens corresponding to the throttle key further comprises determining a number of tokens according to the equation: ((1)(tarx)+(2)(tarx-1)+ . . . +(x?1) (tar2)+(x)(tar1))/(1+2+ . . . +(x?1)+x), wherein x is the number of preceding intervals and tarx is the number of requests serviced by the service host during the xth interval.13. One or more non-transitory computer-readable media comprising executable instructions for load balancing access to a network-accessible service provided by a plurality of service hosts, wherein the instructions, when executed by a distributed load-balancing system, cause the distributed load-balancing system to:receive, by a service host of the plurality of service hosts, an access request from a client to access the network-accessible service;determine a throttle key for the access request;identify an individual local token bucket corresponding to the throttle key for the access request;determine a number of available tokens in the individual local token bucket;responsive to a determination that the number of tokens in the individual local token bucket is insufficient to satisfy the access request, transmit to a global cache a request to dispense additional tokens corresponding to the throttle key from a global token bucket for the throttle key, wherein the global cache is configured to refill the global token bucket with an additional number of tokens at each interval of a set of intervals and respond to requests to dispense additional tokens from the global token bucket by dispensing tokens to a requesting service host from the global token bucket when a number of available tokens in the global token bucket is greater than a threshold number and by notifying the requesting service host that insufficient tokens exist within the global token bucket when the number of available tokens in the global token bucket less than the threshold number;transmit the generated request to a global cache; andobtain, from the global token bucket for the throttle key maintained at the global cache, a sufficient number of additional tokens to satisfy the access request; andservice the access request using at least the additional tokens.14. The one or more non-transitory computer-readable media of claim 13, wherein the instructions cause the global cache to refill the global token bucket associated with the throttle key responsive to the request to dispense additional tokens.15. The one or more non-transitory computer-readable media of claim 14, wherein the instructions cause the global cache to, prior to refilling the global token bucket associated with the throttle key, determine that the number of tokens in the global token bucket is insufficient to dispense the additional tokens.16. The one or more non-transitory computer-readable media of claim 14, wherein, to refill the global token bucket associated with the throttle key, the instructions cause the global cache to add a number of tokens to the global token bucket that is less than a maximum number of tokens the global token bucket can contain.17. The one or more non-transitory computer-readable media of claim 14, wherein the instructions cause the global cache to, prior to refilling the global token bucket associated with the throttle key, determine that the global token bucket contains no tokens.18. The one or more non-transitory computer-readable media of claim 14, wherein, to refill the global token bucket, the instructions cause the global cache to calculate a number of tokens by multiplying a refill rate of the global token bucket with an amount of time since the last refill and add the minimum of the calculated number of tokens and an amount equal to a maximum number of tokens the global token bucket may contain minus the number of tokens the global token bucket currently contains.19. The one or more non-transitory computer-readable media of claim 13, wherein to generate a request for additional tokens associated with the throttle key, the instructions cause the distributed load-balancing system to determine a number of tokens calculated as a weighted average of the number of requests received over a plurality of recent intervals, wherein the calculation comprises a polynomial of degree greater than or equal to one.20. The one or more non-transitory computer-readable media of claim 19, wherein to generate a request for additional tokens associated with the throttle key, the instructions cause the distributed load-balancing system to determine a number of tokens according to the equation: ((1)(tarx)+(2)(tarx-1)+ . . . +(x?1) (tar2)+(x)(tar1))/(1+2+ . . . +(x?1)+x), wherein x is the number of preceding intervals and tarx is the number of requests serviced by the service host during the xth interval.21. The one or more non-transitory computer-readable media of claim 13, wherein the request transmitted to the global token bucket for additional tokens associated with the throttle key comprises a request for a number of tokens based on a fraction of a maximum number of tokens the global token bucket can contain.