Read Queue
Only a few entries are needed to be effective: 2-4 is typical. Reads fetch a whole cache line (by bursting multiple reads onto the main memory bus). However, the requested word could be the last in the cache line, leading to an intolerable delay (with the CPU blocked waiting for the data) while the first words are read. Thus the SIU must also ensure that the requested word is read first and transferred to the cache (and on to the read request in the main pipeline) followed by the remainder of the cache line. So a burst transaction for 8 words (32 bytes) on a 64-bit bus will not always read the words in order
(0 and 1, 2 and 3, 4 and 5, 6 and 7)
but may read them out-of-order, eg word 5 is requested, so the order is
(4 and 5, 6 and 7, 0 and 1, 2 and 3)
Bus clocks (typically 30-60MHz) are much slower than processor clocks (typically 200-600MHz) so the penalty for not re-ordering the bus transactions is extreme!
Write Queue
As with the read queue, a small number of entries suffice: 2-3 is typical. Transactions in the write queue are only selected for execution if the read queue is empty. This is a key function of the SIU - giving priority to read transactions that could block the CPU.
Note that read requests must be checked against pending transactions in the write queue: the write queue may have the latest value written to a memory location (the main memory has not yet been updated - a coherence problem) and its this value which should be supplied to the read operation.