memory - Whats the difference between physical and virtual cache?

07
2014-07
  • Fionn

    I am having trouble understanding what virtual cache actually is. I understand virtual memory.

    If the CPU wants to access memory, as far as I understand, it sends a virtual address to the MMU which, using page tables, figures out the physical memory address.

    Now as well as this the CPU sends a different address (just the end of the virtual address), which consists of a set no. a tag and a offset, to the cache which then works out if it resides in cache.

    How does virtual cache differ from this?

    enter image description here

  • Answers
  • Paul A. Clayton

    There are four ways to address a cache depending on whether virtual or physical address bits are used for indexing and/or for tagging.

    Because indexing the cache is the most time critical (since all the ways in a set can be read in parallel and the appropriate way selected based on a tag comparison), caches are typically indexed with the virtual address, allowing the indexing to begin before address translation is completed. However, if only bits within the page offset are used for indexing (e.g., with each way being no larger than the page size and simple modulo of the way size for indexing1), then this indexing is actually using the physical address. It is not uncommon for L1 associativity to be increased primarily to allow a larger cache to be indexed by the physical address.

    While indexing based on physical address is possible with ways larger than the page size (e.g., by predicting the more significant bits or a fast translation mechanism providing those bits using the delay of indexing with the known physical address bits to hide translation latency), it is not commonly done.

    Using virtual addresses for tagging allows a cache hit to be determined before translation has been done. Permissions still need to be checked before the access can be committed, but for loads the data can be forwarded to the execution units and computation using the data begun and for stores the data can be sent to a buffer to allow delayed commitment of state. A permission exception would flush the pipeline, so this does not add design complexity.

    (The vhints used by the Pentium 4 data cache provided this latency advantage by using a subset of the virtual address bits that are available early to speculatively select the way.)

    (In the days of optional external MMUs, virtual address tags could be particularly attractive in pushing the translation almost entirely outside of the cache design.)

    Although virtually indexed and tagged caches can have significant latency advantages, they also introduce the potential for aliasing where the same virtual address maps to different physical addresses (homonyms) or the same physical address maps maps to different virtual addresses (synonyms). Indexing and tagging with physical addresses avoids aliasing.

    The homonym problem is relatively easily solved by using address space identifiers (ASIDs). (Flushing the cache when changing address spaces will also guarantee no homonyms, but such is relatively expensive. At least partial flushing would be needed when an ASID is reused for a different address space, but an 8-bit ASID can avoid flushes on most address space changes.) Typically ASIDs would be managed by the operating system, but some systems provided hardware checks for ASID reuse based on the page table base address.

    The synonym problem is more difficult to solve. On a cache miss, the physical addresses of any possible aliases must be checked to determine if an alias is present in the cache. If aliasing is avoided in the indexing—by indexing with the physical address or by the operating system guaranteeing that aliases have the same bits in the index (page coloring)—, then only the one set needs to be probed. By relocating any detected synonym to the set indicated by the more recently used virtual address, the alias is avoided in the future (until a different mapping of the same physical address occurs).

    In a direct mapped virtually tagged cache without index aliasing, a further simplification is possible. Since the potential synonym will conflict with the request and be evicted, either any necessary writeback of a dirty line can be done before the cache miss is handled (so a synonym would be in memory or a physically addressed higher level cache) or a physically addressed writeback buffer can be probed before the cache line fetched from memory (or higher level cache) is installed. An unmodified alias need not be checked since the memory contents will be the same as those in the cache, merely doing unnecessary miss handling. This avoids the need for additional, physical tags for the whole cache and allows translation to be relatively slow.

    If there is no guaranteed avoidance of aliasing in the index, then even a physically tagged cache would need to check the other sets that might contain aliases. (For one non-physical bit of index, a second probing of the cache in the single alternative set may be acceptable. This would be similar to pseudo-associativity.)

    For a virtually tagged cache, an extra set of physical address tags can be provided. These tags would only be accessed on misses and can be used for I/O and multiprocessor cache coherence. (Since both misses and coherence requests are relatively rare, this sharing is not typically problematic.)

    AMD's Athlon, which used physical tagging with virtual indexing, provided a separate set of tags for coherence probes and alias detection. Since three virtual-only address bits are used for indexing, seven alternative sets had to be probed for possible aliases on a miss. Since this could be done while waiting for a response from the L2 cache, this did not add latency and the extra set of tags could also be used for coherence requests which were more frequent given the exclusivity of the L2 cache.

    For a large virtually indexed L1 cache, an alternative to probing many additional sets would be to provide a physical to virtual translation cache. On a miss (or coherence probe) the physical address would be translated to the virtual address that might be used in the cache. Since providing a translation cache entry for each cache line would be impractical, a means would be needed to invalidate cache lines when a translation is evicted.

    If aliasing (at least of writable addresses) is guaranteed not to occur, e.g., in a typical single address space operating system, then the only disadvantage of a virtually addressed cache is the extra tag overhead from the fact that virtual addresses in such systems are larger than physical addresses. Hardware designed for a single address space OS could use a permission lookaside buffer instead of a translation lookaside buffer, delaying translation until a last level cache miss.


    1 Skewed associativity indexes different ways of the cache with different hashes based on more bits than necessary for modulo indexing of the same size ways. This is useful for reducing conflict misses. This can introduce aliasing problems that would not be present in a modulo-indexed cache of the same size and associativity.


  • Related Question

    computer architecture - What is the maximum addressable memory?
  • Questioner

    I just started learning assembly.

    My laptop specification says:

    Microprocessor: Intel Core Duo processor T2300

    Microprocessor Cache: 2MB L2 Cache

    Memory Max: 2048MB

    Memory: 1024MB 667MHz DDR2 System Memory (2 Dimm)

    "Intel Core Duo processor T2300" specification says:

    instruction Set : 32-bit

    I think now I can assume that the data bus is also at least 32 bit. So minimum addressable memory should be 4GB.

    Moreover CPU specification also mention the Memory Specifications

    Physical Address Extensions 32-bit

    which as I understand means it can address 64GB of memory

    Would that mean that, given only 2 memory slots on motherboard, my laptop can support 2x 2GB memory sticks == (4GB) memory?

    I guess the laptop guys assumed that there won't be 2GB sticks, so they might have mentioned Memory Max: 2048MB


  • Related Answers
  • Kevin Montrose

    Addressable memory for a 32-bit system is 4GB, physical memory is whatever's installed. Your operating system manages the later to give running programs the illusion of the former. Its a good deal more complicated than that, but that's the gist of it.

    PAE increases the amount of physical memory that a machine can use, not the addressable memory. Pointers remain 32-bit, so addressable memory is still restricted to 4GB.

    Memory capacity on a machine is dictated by more than just what the CPU is capable of. Don't assume your machine can support 4GB.

  • Paul Tomblin

    There are many factors beyond the memory address space that controls how much memory a computer can address. For instance, my wife's Mac Book PRo can only support 3GB, and if you put two 2GB memory sticks, it still only addresses 3GB of it.

    By the way, 32 bits means the chip can theoretically address 4GB of memory. Where did you ge that 64GB number?

  • Brian Rasmussen

    There are a number of factors in play here. With a 32 bit architecture the OS is be able to address 4 GB of memory. However, all of this may not be available for applications. For instance a Windows machine with 4 GB of memory will usually not be able to use more than roughly 3,5 GB for OS and applications as some of the address space is used to map hardware.

    Also, Windows splits the 32 bit memory address space into two: 2 GB for kernel memory used by the OS and 2 GB for user space applications. I.e. per default each application will only be able to access 2 GB of memory. Windows can be configured to use 1 GB for kernel and 3 GB for user.

    On top of that the actual hardware may set certain limitations. When your laptop specification says maximum memory is 2 GB it is most likely because that is the maximum the motherboard will support. It doesn't matter how much the OS is able to address. If the hardware will only recognize 2 GB then that is the limit on physical memory for the machine.

  • 8088

    No, I completely disagree - a 32-bit processor does not mean that the addressable memory is 4GB. Strictly saying 32-bit processors mean that your ALU size is 32 i.e. it can perform operation on 32 bit data at a time. *note- a 32 bit CPU does not mean the size of the data bus. As your CPU is 32 bit it can manipulate on 32 bit data (which can be an address) so it is faster in operation.

    All it depends on is the size of your address bus. If the address bus size is 32 bit it means there are 2^32 location available for your CPU to which it can communicate. The location start from 0H to FFFFFFFF. Imagine that your CPU is 32 bit but your address bus is only 8 bits. How many location are available for your CPU to communicate? There are only 2^8=256 locations available for your CPU to communicate. Since each location is 8 bits=1byte, your CPU can only address up to 256 bytes of memory.