In some embodiments, parallel zeroing uses “unused” physical cpu capacity on each node, and is especially useful because the guest OS typically does not use all the VCPUs as it starts up. Furthermore, due to the design of chip architectures such as the Intel Xeon, an individual processor core cannot zero memory at full memory speed due to potential conflicts with multiple cores trying to write to the same bank of memory. Typically, between two and four cores on a chip are required to saturate the memory write access on a processor. Thus, memory initialization within a node can proceed in parallel with operating system initialization. In addition to using multiple processors on a node to initialize memory, processors on other nodes proceed in parallel as well, such that there is parallel processing in two dimensions. Further details regarding parallel memory initialization are described below.
Additional Details Regarding Dormant Pages
In some embodiments a special type of entry is created in the second level page tables that corresponds to pages that have not been made manifest (i.e., they don't really exist). The pages represented by these page table entries are referred to herein as “dormant” pages. That there is no memory actually allocated behind these pages speeds up the memory initialization process.
Dormant pages may be referenced by an operating system and application software in the same way as normal pages. However, once referenced, the dormant page may then be converted to a real page of memory via the stall process described above that is used to detect references to pages that are not local to a processor on a node making the reference.