9. Database Buffer Manager#

In Altibase, data objects in the disk tablespace must be loaded from disk into memory in order for them to be accessed or updated. Memory that is used temporarily in this way is referred to as a "buffer", and in Altibase this memory is collectively called the "buffer pool".

If all of the data on disk were loaded into the buffer pool, it would be possible to access any data quickly without incurring any disk I/O expense. However, because the amount of memory is limited, it is only possible to load some of the data that exist on disk into the buffer pool. When data that have been loaded into the buffer pool are removed to make way for other data, this is called data replacement. Because this has such a strong impact on system performance, an efficient algorithm must be used to determine which data are to be saved in the buffer pool and how long to save the data in the buffer pool.

In Altibase, the entity that manages the buffer pool is known as the buffer manager. The main role of the buffer manager is to save more frequently accessed data in the buffer longer, and to manage the buffer efficiently.

This chapter describes the structure and function of the buffer manager, how to manage the buffer pool, related properties, and so on.

Structure of the Buffer Manager#

Components#

The components of the buffer manager include the buffer area, the buffer pool, buffer frames, and Buffer Control Blocks (BCBs). The buffers in the buffer pool are organized as follows: an LRU list, a prepare list, a flush list, a checkpoint list and a hash table. This section describes the components of the buffer manager.

Buffer Area#

The buffer area is the reserved memory space which is assigned to the buffer pool. The size of the buffer managed by the buffer manager depends on the size of the buffer area.

Buffer Pool#

The buffer pool is the key element of the buffer manager, and is the implementation of the buffer replacement policy. It loads the requested data pages into the buffer and returns the memory address of the area into which the pages were loaded. Internally, the buffer pool manages BCBs using the hash table, LRU list, prepare list, and flush lists.

Buffer Frame#

A buffer frame is a space allocated for loading one page in memory. One buffer frame is the same size as one page. A set of buffer frames are assembled to form a buffer fool. A buffer pool is configured a gathering of buffer frames.

BCB (Buffer Control Block)#

Buffer Control Blocks (BCBs) contain information about buffer frames. One BCB corresponds to a single buffer frame. The buffer manager uses BCBs to manage information about all of the pages loaded into buffers, whereas buffer frames are merely space into which pages can be loaded. Each BCB maintains an address for the corresponding buffer frame.

The following figure and table describe the structure and information of buffer control blocks.

[Table 9-1] BCB Information

Property	Description
BCB Status	This is the current status of the buffer frame. Possible values: FREE/CLEAN/DIRTY - FREE: no page is loaded in the buffer frame - CLEAN: page is loaded in buffer frame but not updated - DIRTY: page is loaded in buffer frame and updated but not written to disk
Buffer Frame Address	Buffer Frame Address The address of the buffer frame corresponding to the BCB
Space ID	The identifier of the tablespace containing the page
Page ID	The unique identifier of the page in the tablespace
Page Owner Lock	In order to access a page, it is first necessary to acquire a lock. Read, Write, and Fix mode locks can be acquired. A page in the buffer can be accessed after a lock on a BCB corresponding to the page is acquired in a particular mode.
Modified LSN	The next time the contents of the disk buffer are flushed to disk, the portion of the disk buffer that will be flushed to disk is the portion corresponding to all changes up to and including the change corresponding to this LSN.
Fix Count	This is the number of transactions that are simultaneously accessing a page. If this value is 1 or higher, the page can't be replaced, whereas if it is 0, the page can be replaced.
Touch Count	This is the number of transactions that have accessed a page since it was loaded into a buffer. This value is used to determine whether the page is hot or cold.

Hash Table#

When the Altibase server receives a request for a page, it searches for the BCB of the page in the hash table in order to check whether the page has already been loaded into the buffer. The BCBs of all pages that are loaded into the buffer are registered in the hash table.

LRU (Least Recently Used) List#

This is used to determine which buffers have not been accessed for a long time so that they can be replaced first.

In Altibase, the LRU list is separated into hot and cold zones, and can thus be called a "hot-cold LRU list". Buffers that are accessed frequently are placed in the hot zone, whereas those that are not accessed frequently are placed in the cold zone. When a buffer needs to be replaced, only the cold zone is searched, meaning that hot buffers are not considered as replacement candidates.

When a page is first loaded into a buffer, it is inserted at the mid-point (LRU cold first) of the LRU list. When allocating a buffer to the new data page, if there are no free buffers in the prepare list, the end (LRU cold last) of this list is searched first, and a cold buffer is then replaced. The buffer that is replaced is called a "victim".

Buffers that are read frequently are moved to the "LRU hot first" position in the hot zone. Meanwhile, dirty buffers (buffers containing pages that have been updated but haven't been flushed to disk) are moved to the flush list. Additionally, clean buffers (buffers containing pages that haven't been updated) are designated as replacement buffers as long as they are not in the hot zone.

The relative size of the hot zone can be set using the HOT_LIST_PCT property. The default is 50, which means that half of the LRU list is used as the hot zone.

Flush List#

If dirty buffers are found while an LRU list is searched for buffers to replace, they are moved to the flush list. The flush list is a collection of buffers containing pages that have been updated but that hasn't been written to disk yet. However, not all dirty buffers are on the flush list. This is because they are moved to the flush list, not at the time point at which they are updated, but when the LRU list is searched for buffers to replace.

When a replacement flush occurs, the updated pages on this list are written to disk, and clean buffers are thus obtained.

Prepare List#

All buffers on the flush list that have been written to disk are moved to the prepare list. That is to say, the prepare list consists of clean buffers that have been flushed

When the buffer manager searches for buffers to replace, it first searches the prepare list. If it can't find suitable buffers in the prepare list, it then searches the LRU list. However, even if a buffer is on the prepare list, it isn't necessarily clean. This is because the contents of the buffer can be updated.

Checkpoint List#

Since the LRU list, flush list, and prepare list are mutually exclusive, so a buffer can't exist on two or more lists. However, because the checkpoint list is managed independently of the other lists, buffers on any of the other 3 lists may also be found on the checkpoint list.

Dirty buffers, that is, updated buffers, are present on the checkpoint list, and all buffers on the checkpoint list are dirty. The buffers on this list are assigned LSNs corresponding to the time points of the first update, and are sorted and managed based on this. When the checkpoint list is flushed, the buffers on the checkpoint list having the lowest LSNs are flushed first.

[Figure 9-3 Buffer Pool] shows that all of the buffers on the LRU list, flush list, and prepare list can be accessed using a hash table.

List Multiplexing#

The LRU list, flush list, prepare list, and checkpoint list can each be multiplexed. List multiplexing prevents multiple databases clients from simultaneously causing list lock contention when requesting services.

The number of each kind of list can be specified using the BUFFER_LRU_LIST_CNT, BUFFER_PREPARE_LIST_CNT, BUFFER_FLUSH_LIST_CNT, and BUFFER_CHECKPOINT_LIST_CNT properties, and can be checked by querying the LRU_LIST_COUNT, PREPARE_LIST_COUNT, FLUSH_LIST_COUNT, and CHECKPOINT_LIST_ COUNT columns of the V$BUFFPOOL_STAT performance view. However, these values cannot be changed while the server is running.

BCB State Transition#

Each BCB always has one of three states: FREE, CLEAN, or DIRTY.

FREE#

This status indicates that no pages are loaded in the buffer. When the system first starts up, most of its buffers will be free. In addition, when a tablespace is dropped or taken offline, all buffers corresponding to pages belonging to the tablespace are freed.

CLEAN#

This status indicates that pages are loaded into the buffer but haven't been updated. In this status, the contents of buffer pages is the same as the contents of pages on disk. The buffer can be accessed using a hash table. The buffer can be found on one of the LRU list, flush list, and prepare list, but cannot be present on the checkpoint list.

DIRTY#

This status indicates that the pages in the buffer have been updated, but have not yet been written to disk. Dirty buffers are present on the checkpoint list, and also on one of the LRU lists, flush list and prepare list. The dirty buffer is changed to CLEAN state after being flushed.

Flusher#

The flusher performs flushing, which means writing to disk the contents of all buffers that are to be replaced. In Altibase, there are two types of flushes: replacement flush and checkpoint flush.

Replacement Flushing#

In replacement flushing, updated buffers that have not been accessed for a long time are flushed so that they can be replaced.
Checkpoint Flushing#

In checkpoint flushing, buffers that were first updated a long time previously are flushed to decrease the amount of time taken to perform checkpointing.

The replacement flush is forcibly executed either periodically or when no replacement buffer is found. Checkpoint flushing can also occur periodically, but can be executed by the user with the ALTER SYSTEM CHECKPOINT command.

The flusher performs periodic checkpoint flushing only in the case where it was first performed replacement flushing and then has no pending tasks. That is, if the flusher finds buffers to be replaced after waiting for some specified time, it writes them to disk. The flusher waits for an amount of time that is greater than that specified using DEFAULT_FLUSHER_WAIT_SEC and less than that specified using MAX_FLUSHER_WAIT_SEC. After performing replacement flushing, the flusher again waits for the specified amount of time, and if there are no buffers to be replaced at that time, it decides whether or not to perform checkpoint flushing or to continue waiting.

The conditions for checkpoint flushing are as follows:

The amount of time specified in CHECKPOINT_FLUSH_MAX_WAIT_SEC has passed since the flusher most recently performed flushing.
The number of logs pertaining to pages that must be recovered exceeds some specified number when the system is restarted. That is, the lowest LSN of the LSNs of pages that have been updated and haven't been flushed to disk equals the value of CHECKPOINT_FLUSH_MAX_GAP.

However, even if the flusher is instructed to wait for a long time, if the number of buffers to be replaced becomes high as the result of frequent transactions, forced flushing may occur.

The number of buffer frames that can be flushed in one flusher cycle can be specified for checkpoint flushing using the CHECKPOINT_FLUSH_COUNT property.

Additionally, multiple flushers can be specified using the BUFFER_FLUSHER_CNT property. However, this property cannot be changed while the server is running. Each flusher can be started or paused using the ALTER SYSTEM START/STOP FLUSHER statement.

Please refer to the SQL Reference for more detailed information about SQL.

Managing Database Buffers#

Access Modes#

A lock must be acquired in order to access a page. Access modes are categorized, according to the kind of authority that is granted, into read, write and fix modes, which are outlined below:

[Table 9-2] Buffer Access Mode

Access mode	Description
Read	This access mode is only for reading pages that have been loaded into the buffer. Multiple transactions can access the buffer at the same time.
Write	This access mode is for writing to the pages that have been loaded into the buffer. In this mode, only one transaction can access a page at a time.
Fix	In this access mode, after the page has been uploaded to the buffer, it is guaranteed that it will not be replaced by the buffer manager. If a transaction accesses and reads a page in fix mode, the accuracy of the data cannot be guaranteed. In order to be sure that the data being read are correct, the page must be accessed in read mode.

The permission relationship between access modes is as follows

[Table 9-3] Permissions and Access Modes

Permissions	Read	Write	Fix
Read	O	X	O
Write	X	X	O
Fix	O	O	O

As shown in [Table 9-3], when a transaction requests write mode, if another transaction is already acquiring read or write, the request will either be queued or fail. Meanwhile, if some other transaction has already acquired a read mode lock, a request for access in read mode will succeed, but a request for access in write mode will fail.

Standby Modes#

The standby mode determines whether to wait or to return an error immediately when access cannot be granted in the requested mode because the requested page is being used by another transaction.

[Table 9‑4] Standby Modes

Access mode	Description
Wait	If another transaction has already acquired a lock and access in the requested mode cannot be granted, wait until the lock is released after the other transaction finishes its task.
No-Wait	If another transaction has already acquired a lock and the access in the requested mode cannot be granted, return an error without waiting for the other transaction to finish its task.

Page Request Procedure#

1. Search the Hash Table#

The buffer manager receives a page request, that includes information about the page ID, access mode, and wait mode.

When the buffer manager receives a request, it first checks the hash table for the BCB. If the requested page has already been loaded into a buffer, the BCB for that page will be found in the hash table. If the BCB cannot be found in the hash table, this means that the page has not been loaded into a buffer, and the page must be read from disk and loaded into a buffer.

2. Acquire a Lock#

In Altibase, in order to guarantee that data are accurately read, pages that are being read from disk may not be accessed by other transactions until they have been completely read.

This is accomplished by requiring that write privileges be acquired when reading from disk. This means that if a transaction is able to acquire read or writes privileges for a page, the data were not being read from disk at that point in time.

As shown in the following table, whether a lock is granted differs depending on the access mode, on whether pages are read from disk, and on the standby mode.

[Table 9‑5] Acquire a Lock

Access Mode	Read Disk	Standby Mode	Result
Fix	O	Not applicable	Allow after waiting for reading to finish
Fix	X	Not applicable	Allow
Read	O	Wait	Allow when the access mode is permitted; stand by if access fails
Read	O	No-Wait	Allow when the access mode is permitted; stand by if access fails
Read	X	Wait	Allow when the access mode is permitted; stand by if access fails
Read	X	No-Wait	Allow when the access mode is permitted; return an error if access fails
Write	O	Wait	Allow when the access mode is permitted; stand by if access fails
Write	O	No-Wait	Allow when the access mode is permitted; stand by if access fails
Write	X	Wait	Allow when the access mode is permitted; stand by if access fails
Write	X	No-Wait	Allow when the access mode is permitted; return an error if access fails

As shown in the above table, if access is requested in fix mode, it will be granted immediately regardless of the privileges with which other locks are set for the current page. However, if the page is being read from disk, the request must stand by until the page has been completely read from disk.

Moreover, when a request for access with read or write privileges is made in No-Wait mode, if the page is being read from disk, the request must stand by until the page has been completely read from disk.

Reading Pages from Disk#

If a page requested by the buffer manager has not been loaded into a buffer, the page is read from disk according to the following procedure.

1. An Available BCB is Acquired#

When a page has not been loaded into a buffer, the Altibase server must first acquire a BCB in order to load the page into a buffer. To find a buffer into which to load the page, the prepare list is searched first. Finding a free buffer on the prepare list is the easiest way to load a page.

However, if no free buffer can be obtained in this way, the next step is to find a buffer to replace. When the system is initially started up, there are a lot of free buffers, but after that the likelihood of finding a free buffer in the buffer pool is low unless a tablespace is dropped or taken offline. Additionally, no buffers are newly freed after flushing, because buffers are used to store the contents of pages even after they have been flushed.

2. A Buffer is Replaced#

The prepare list is checked first when searching for buffers to replace. This is because the prepare list contains clean buffers, which were flushed and moved from the flush list.

If buffers suitable for replacement can't be found on the prepare list, the LRU list is searched. However, if no clean buffers can be found even when the LRU list is searched, another prepare list is checked. This process is repeated until a clean buffer is found or until all prepare lists have been searched.

However, if no clean buffers can be found even after all prepare lists have been searched, the flushers are instructed to operate, and the search for a buffer pauses to wait for a buffer to be added to a prepare list. This waiting time is specified in the BUFFER_VICTIM_SEARCH_INTERVAL property. If no buffers can be found during this waiting time, the search proceeds to the next prepare list. In Altibase, this phenomenon is called "victim search warp".

If the value of VICTIM_SEARCH_WARP, which can be observed in the V$BUFFPOOL_STAT performance view, is high, this indicates long waits for buffer replacement. If this problem persists, increase either the size of buffers or the number of flush threads, and restart the system.

In order for buffers on the LRU list to be replaced, they must meet the following conditions:

They have never been fixed (buffers whose fix count = 0).
They have pages loaded into them, but those pages have not been modified (clean buffers).
They are not hot (buffers whose touch count < HOT_TOUCH_CNT).

If buffers suitable for replacement are found, they are removed from the hash table, and the next task is performed.

3. The Page is Verified after Being Read#

After a BCB has been acquired, a page is loaded from the disk to the buffer. However, a portion of the page on the disk may have been lost or corrupted due to some unforeseeable circumstances such as a hard disk error or a power failure. If such a problem goes unnoticed, users might be presented with invalid data, so it is important that the Altibase server be aware of such problems. Therefore the Altibase server checks the integrity of each page immediately after the page has been loaded from disk to the buffer.

Flushing#

1. Selecting Buffers to Flush#

Pages that have been modified since they were loaded into buffers are flushed either when they are selected as victims or during checkpointing. Pages that have been loaded into buffers must meet the following conditions in order to be flushed:

They have been updated at least once.
They have never been fixed (fix count = 0).

Pages can be read by other transactions while being flushed. Therefore, in order to perform flushing, read mode privileges are required.

2. Copying Pages to the I/O Buffer#

Once pages to be flushed have been chosen, they are first copied into I/O buffer memory before being written to disk. They are written to disk after being copied there because I/O tasks are expensive and time-consuming compared to memory operations.

When buffer pages have been copied to the I/O buffer, this buffer can be read and updated by other transactions. However, if the I/O buffer were not used, it would be impossible to read or update these pages while they are being written to disk (that is, while the I/O task is being conducted).

3. Log Flushing#

Before a modified page is written to disk, a log of the changes must be written to disk first. This is called the WAL (Write-Ahead Logging) protocol.

Additionally, a checksum value is calculated and written to check whether the page has been corrupted when a page is loaded from disk into a buffer. Every time the page is read from disk, the checksum value is calculated and compared with the checksum for the page to verify the integrity of the page.

4. Writing Pages in the I/O Buffer to Disk#

When the pages in the I/O buffer are written to disk, the entire contents of the buffer are first written all at once to a so-called "double-write file", and then each page is written to a data file.

The reason for writing pages to disk twice in this way is that it is impossible to recover partially written pages, such as those that are generated if a system failure occurs while pages are being written to disk. Such inconsistent pages cannot be recovered even using redo logs.

In Altibase, the directory in which the double write file is saved is specified using the DOUBLE_WRITE_DIRECTORY property. This file is used to verify the consistency of data files when the system is started and to perform system recovery. If this file doesn't exist, these tasks are not conducted.

To use the buffer manager, the properties in the altibase.properties file must be suitably configured set. The properties related to buffers are listed below. For detailed information about each of these properties, please refer to the General Reference > Chapter 2. Altibase Properties.

BUFFER_AREA_CHUNK_SIZE
BUFFER_AREA_SIZE
BUFFER_CHECKPOINT_LIST_CNT
BUFFER_FLUSH_LIST_CNT
BUFFER_FLUSHER_CNT
BUFFER_HASH_BUCKET_DENSITY
BUFFER_HASH_CHAIN_LATCH_DENSITY
BUFFER_LRU_LIST_CNT
BUFFER_PINNING_COUNT
BUFFER_PINNING_HISTORY_COUNT
BUFFER_PREPARE_LIST_CNT
BUFFER_VICTIM_SEARCH_INTERVAL
BUFFER_VICTIM_SEARCH_PCT
BULKIO_PAGE_COUNT_FOR_DIRECT_PATH_INSERT
CHECKPOINT_FLUSH_COUNT
CHECKPOINT_FLUSH_MAX_GAP
CHECKPOINT_FLUSH_MAX_WAIT_SEC
CM_BUFFER_MAX_PENDING_LIST
DEFAULT_FLUSHER_WAIT_SEC
DIRECT_PATH_BUFFER_PAGE_COUNT
FAST_UNLOCK_LOG_ALLOC_MUTEX
HIGH_FLUSH_PCT
HOT_LIST_PCT
HOT_TOUCH_CNT
LOG_BUFFER_TYPE
LOG_FILE_SIZE
LOW_FLUSH_PCT
LOW_PREPARE_PCT
MAX_FLUSHER_WAIT_SEC
REPLICATION_LOG_BUFFER_SIZE
SECONDARY_BUFFER_ENABLE
SECONDARY_BUFFER_FILE_DIRECTORY
SECONDARY_BUFFER_FLUSHER_CNT
SECONDARY_BUFFER_SIZE
SECONDARY_BUFFER_TYPE
SMALL_TABLE_THRESHOLD
TABLE_BACKUP_FILE_BUFFER_SIZE
TOUCH_TIME_INTERVAL

Statistics for Buffer Management#

Statistical information about the buffer manager can be checked using the performance views provided with Altibase. Please refer to the Altibase Data Dictionary for detailed descriptions of the available performance views.

Information related to the buffer pool can be viewed using V$BUFFERPOOL_STAT, and information related to the flusher can be checked using V$FLUSHINFO and V$FLUSHER. Statistical information related to the buffer frames managed by the buffer manager can be viewed using V$BUFFPAGEINFO, and statistical information related to the buffer pool of undo tablespace can be viewed using V$UNDO_BUFF_STAT.

Because the statistical information accumulates from the time the server is started, to obtain statistics for a particular period, a calculation must be performed for all columns in this way: (present value - value at the start of the period of interest).

Calculating Hit Ratio#

The accumulated hit ratio of the buffer pool can be checked in the HIT_RATIO column of the V$BUFFPOOL_STAT performance view.

Exmple

SELECT HIT_RATIO FROM V$BUFFPOOL_STAT;

The Hit Ratio can be calculated using the following formula:

Hit Ratio = (GET_PAGES + FIX_PAGES - READ_PAGES) / (GET_PAGES + FIX_PAGES)