16. DB Diagnostic Monitoring#

This chapter explains how to check and analyze the operational status of an Altibase database.

Monitoring Database Servers#

Meta tables and performance views are used to check the operation status of Altibase database. For more detailed information on meta tables and performance views available in Altibase, please refer to the Altibase Data Dictionary.

The major entities to be monitored are as follows:

Session and Statements#

Information on currently connected sessions can be checked using performance views while Altibase is running. Multiple statements can be assigned to a single session¹. Session properties can be set differently for each session.

The following performance views store information on sessions and statements.

V$SESSION

Information on sessions created in an Altibase server responding to a client
V$STATEMENT

Information on the executed statements for each currently connected session in the Altibase server

Database Information (Tables and Indexes)#

Information about the entire database and each tablespace, table and index, can be checked using meta tables and performance views.

V$DATABASE

Internal information about memory database space
V$TABLESPACES

Information about tablespaces
SYS_TABLES_

Information on tables
SYS_INDICES_

Information on indexes

Memory Usage#

Information on the memory areas used by Altibase while it is running can also be checked using performance views. This includes information about the amount of memory used for memory tablespace data (including old versions of records) storage, index storage space, temporary areas for use in processing queries, session information storage space, the memory buffer pool, and the like

Information on the memory usage of the Altibase server can be checked using the following performance views.

V$MEMSTAT

Statistical information about the memory usage for each internal module by Altibase processes
V$MEMTBL_INFO

Information about memory tables
V$BUFFPOOL_STAT

Statistical information related to buffer pools

Replication Status#

Finally, the status of replication can also be checked using performance views. This includes the status of threads related to replication, particularly the Sender and Receiver threads, as well as the status of replication data transmission.

V$REPRECEIVER, V$REPRECEIVER_PARALLEL

Information about the replication Receiver, and replication Receiver threads working in parallel
V$REPSENDER, V$REPSENDER_PARALLEL

Information about the replication Sender, and replication Sender threads working in parallel

Troubleshooting Procedures#

This section describes how to check and analyze various problem situations that may occur while Altibase is running.

Because it is impossible to foresee all of the wide variety of problems that can occur in an actual operating environment, the problems you experience will probably not be exactly the same as those described here. Nevertheless, this section classifies the sorts of problems that fall within the range of what can reasonably be expected, and provides detailed information on how to respond to them.

The problems that are typically experienced can generally be thought of as falling into one of the following categories:

16. DB Diagnostic Monitoring
Monitoring Database Servers
Troubleshooting Procedures

The general troubleshooting procedure is as follows:

[Figure 16-1] General Troubleshooting Procedure

The Altibase Administrator Logs are text logs that are created and maintained in the $ALTIBASE_HOME/trc/ directory with the *.log filename extension. This directory contains the following trace log files:

altibase_boot.log
altibase_id.log
altibase_mt.log
altibase_qp.log
altibase_rp.log
altibase_sm.log

Diagnosing Abnormal Server Termination and Restart Failure#

Abnormal Termination#

The following are some of the possible causes for the Altibase to shut down abnormally:

Insufficient memory
System OS in panic status

In the event of an abnormal shutdown, check the error messages left in the Administrator logs and make a judgment about the cause on the basis of those error messages. Unless the shutdown was caused by insufficient memory, consult with a professional systems engineer for advice.

If the shutdown was caused by insufficient memory, a system call error message related to memory allocation, such as Memory allocation failed or Unable to invoke the shmget() system function will have been recorded in one or more of the Administrator Logs.

If this is the case, check the amount of memory that is currently in use to determine whether any needlessly large areas of memory are being used. If this is the case, release the memory and identify the cause of the excessive memory usage to prevent the reoccurrence of the problem. If there does not seem to be any specific cause of excessive memory usage, consider upgrading the system memory.

Memory-related problems will be discussed further in greater detail in the next section, Diagnosing Problems related to Memory Usage.

Altibase Restart Failure#

Restart failures may be caused by any of the following:

Another Altibase process exists, and is already using the same service port (specified using the PORT_NO property).
Files required for startup or recovery are missing, or alternatively, files could not be accessed, attributable either to file permissions or to a problem with the file system.

If there is a file access error message in the admin log, verify the presence of all files, including all log files, log anchor files, and data files, to determine whether the relevant file or files exist. If the error occurred even though the file exists and can be accessed, the file might be corrupted, in which case the database will need to be created again or recovery procedures will need to be performed.
Insufficient system resources

If system startup failed due to insufficient system resources, identify which resource is lacking, check the actual amount of that resource that has been loaded and is available on the system, check the system kernel settings, and fix the area having the problem.

Most system resource-related problems that cause startup to fail are memory- or semaphore-related.

For memory-related problems, check the amount of memory that is available to a single process, the maximum size of a segment and the like in the system kernel settings.

Diagnosing Poor Server Responsiveness#

If the Altibase server is actually processing a query but the response time is very slow, it is easy for a user to conclude that the server has become nonresponsive erroneously.

In the case of a slow response when requesting a query, the cause is most likely either because an entire table is being scanned, or because memory swapping is taking place due to insufficient memory. In this case, check the system information managed by the operating system and the information about the session in which the query was executed to determine whether the query is actually being processed. If swapping occurs due to the excessive CPU usage or insufficient memory, it is likely that the query is being processed.

A more detailed description will be provided below in the section entitled Diagnosing Problems related to Application and Query Execution.

Another cause of nonresponsiveness is insufficient disk space. This possibility should be considered in cases where there is no response after data are changed or entered. In such cases, an error message indicating insufficient disk space is recorded in one or more of the Administrator Logs, and the nonresponsive status will persist until sufficient disk space has been acquired. A detailed description of how to solve problems related to insufficient disk space will be provided below in the section entitled Diagnosing Problems related to Disk Usage.

If there are other problems that leave the Altibase unresponsive, consult a professional system engineer.

Insufficient Disk Space#

If the amount of disk space becomes insufficient while Altibase is running, the system will stop operating without any further data changes. In such cases, the first task is to determine which of the file systems is lacking in disk space

While Altibase is running, it uses disk space in the following ways:

For log file storage
For tablespace file storage

Active logs and archive logs are continuously generated while Altibase is running. Active log files are saved in the directory designated using the LOG_DIR property in the Altibase configuration file, altibase.properties. Additionally, when the database is operating in archive log mode, archive log files are automatically saved in the directory designated using the ARCHIVE_DIR property.

If, due to insufficient disk space, Altibase becomes unable to save any more active log files, Altibase stops operation. In this case, because deleting log files or log anchor files would make a recovery impossible, it is necessary to either extend the size of the file system or delete unnecessary files.

In the case of archive log files, if the ARCHIVE_FULL_ACTION property in the altibase.properties file is set to 0, when the system runs out of space for saving archive logs, it will continue to operate without saving them. If this property is set to 1, however, the system will stop operating until additional space is made available.

If the number of log files saved in the directory specified using the LOG_DIR property increases, resulting in a shortage of space for saving log files, first check the Administrator Log files to determine whether checkpointing is executing normally, and verify that the CHECKPOINT_INTERVAL_IN_SEC and CHECKPOINT_INTERVAL_IN_LOG properties in the Altibase configuration file are properly set. If checkpointing is executing normally but there are still log files that have been deleted or archived remaining in the directory specified using the LOG_DIR property, check the status of replication transmission. If replication transmission is slow, or if replication logs cannot be sent, log files will not be archived or deleted, and will thus accumulate in the directory specified using the LOG_DIR property, which can lead to a shortage of log storage space.

A more detailed description will be provided below in the section entitled Diagnosing Problems related to Replication.

Memory and system tablespaces are saved in the directory specified using the MEM_DB_DIR property in the altibase.properties file. If the shortage in disk space is tablespace-related, check this property and the amount of space available for storing user-created tablespace files. The file system that stores the tablespace must have at least as much free space as the tablespace increased.

Because the storage space for all memory tablespaces is used for checkpointing, the amount of free space must be greater than the size of all memory tablespaces existing in memory.

If Altibase runs out of memory while it is running, the response time can greatly increase, and Altibase might shut down abnormally. In this case, check the amount of memory that Altibase is using to determine whether it is reasonable, and eliminate any causes of unnecessary memory space usage, if any. If there do not seem to be any factors causing unnecessary memory space usage, consider providing additional memory.

The memory space used by Altibase during operation can be broadly classified as follows:

Memory tablespaces
Temporary memory spaces
Memory buffers.

Both actual (current) memory table records and previous versions of those records, which are required in order to support MVCC, are stored in memory tablespaces.

Temporary memory space is used to store indexes for memory tables, space for sorting records when querying memory tables, information about sessions, and the like.

The memory buffer is used to sort disk table records and perform other kinds of operations on disk tables.

If excessive use of memory is suspected, for a set period monitor the number of statements that are created, the amount of temporary memory space that is used, memory tablespace usage, the size of memory table indexes and the like to determine whether they have increased, and use system monitoring utilities such as ps and top in Unix to determine whether the size of a process is increasing continuously.

When Altibase is initially started up, memory usage may be higher than normal for a period of time, due to temporary memory area allocations, multiple previous versions of records are reconstructed, and the amount of session information increases, which is normal.

However, if memory usage continues to rise after a period of operation, a memory leak or similar problem may be the culprit. In this case, consult with a professional system engineer.

Diagnosing Excessive CPU Usage#

When Altibase exhibits sudden surges in CPU usage, the following scenarios should be suspected as possible causes:

A query of a memory table was processed without using an index.
Excessive disk I/O occurred when a query of a disk table was processed.
Swapping occurred due to insufficient memory.

Use a system performance monitoring tool to check the amount of memory that is being used. If memory swapping is occurring due to insufficient memory, it may be necessary to install additional memory. A more detailed description is provided in the section entitled Diagnosing Problems related to Memory Usage.

If the problem was caused not by insufficient memory but by the full scan of a memory table or excessive disk usage relating to a query of a disk table, the problem can be solved by tuning the corresponding query or modifying the table.

A detailed description will be provided below in the section entitled Diagnosing Problems related to Application and Query Execution.

The following types of replication-related problems may occur:

Failure to start replication
Slow progress of replication
Inconsistent number of records in the tables being replicated

If a replication transmission problem occurs, log files accumulate in the log storage space (which can decrease the amount of free space to the point where it is insufficient), ultimately resulting in a service interruption.

If a replication-related problem occurs, check the administrator log files for any replication-related error messages, and pass this information on to a specialized engineer.

If one of the systems in a replication environment develops a fault and remains faulty for a long time, it will be impossible to transmit replication data, decreasing the amount of available space in the log storage directory. Therefore, if it is taking a long time to solve the problem with the system in which the fault occurred, consider pausing replication and deleting the replication object to prevent problems with the system that is still running. In such cases, once the problem has been solved, it will subsequently be necessary to perform data recovery on the system that developed the fault. The data can be recovered either using the iLoader tool or by running replication in SYNC mode.

If it is difficult to delete the replication object, it will be necessary to continuously monitor the amount of available space in the log storage directory and add additional space if it becomes insufficient

Similar measures should be taken in the event of a problem with the replication network connection, or in the event that replication connection is unsuccessful for a long time due to some replication-related error.

The problems that may occur while executing applications and queries fall generally into the following two cases:

An application fails to connect to Altibase.
After an application submits a query processing request to Altibase, the system stops operating, or the query times out

Altibase Access Failure#

If an application fails to access Altibase, determine whether it is possible to connect to Altibase using the iSQL Altibase utility. Because Altibase supports four different access methods (TCP/IP, Unix domain socket, IPC, IPCDA), test the connection using the access method that is used by the application.

The access method can be configured by setting the ISQL_CONNECTION environment variable to one of TCP, UNIX, IPC, or IPCDA.

If it is possible to connect to Altibase using iSQL, the problem is related to internal access settings within the application, and can be solved by adjusting these settings. If connection via iSQL fails, either of the following scenarios may be the reason:

The number of connected sessions exceeds the value specified using the MAX_CLIENT property in the altibase.properties file.
If the access protocol is IPC, the number of sessions connected via IPC exceeds the value specified using the IPC_CHANNEL_COUNT property.

Nonresponsiveness or Session Disconnected due to Timeout#

If no response to a query processing request is received due to slow query processing, any of the following situations may be the cause:

Swapping is taking place because the amount of memory is insufficient.
Performance is decreased due to a scan of an entire table.
The system is waiting to acquire a lock on a table.

First, use a system performance monitoring tool and check the CPU usage and memory usage to identify whether the problem is caused by insufficient memory. If this is the case, please refer to the Diagnosing Problems related to Memory Usage section above.

If there is no memory shortage problem, but CPU usage is high, check all of the queries that are currently executing to determine whether any of them is performing a full scan of a table.

To view a list of the queries that are currently being processed, connect using iSQL and check the QUERY column in the V$STATEMENT performance view.

Determine which of the queries that are currently executing is likely to be causing the problem, check the execution plan for that query, and tune the query if a problem is found

For a detailed explanation of how to tune queries, please refer to Performance Tuning Guide.

If the problem does not seem to have been caused by either of the above two reasons, it is likely that the system is waiting to acquire a lock on a resource. Check the information on currently held locks in the V$LOCK and V$LOCK_WAIT performance views to verify whether any unnecessary locks are being continuously held in any sessions. If this is the case, forcibly terminating the session will solve the problem.

A so-called statement in this context is an internally used object for processing a single SQL statement on a one-to-one basis. ↩

Administrator

Reference

Developer

Tools

External Tools

16. DB Diagnostic Monitoring#

Monitoring Database Servers#

Session and Statements#

Database Information (Tables and Indexes)#

Memory Usage#

Replication Status#

Troubleshooting Procedures#

Diagnosing Abnormal Server Termination and Restart Failure#

Abnormal Termination#

Altibase Restart Failure#

Diagnosing Poor Server Responsiveness#

Insufficient Disk Space#

Diagnosing Excessive CPU Usage#

Altibase Access Failure#

Nonresponsiveness or Session Disconnected due to Timeout#

16. DB Diagnostic Monitoring#

Monitoring Database Servers#

Session and Statements#

Database Information (Tables and Indexes)#

Memory Usage#

Replication Status#

Troubleshooting Procedures#

Diagnosing Abnormal Server Termination and Restart Failure#

Abnormal Termination#

Altibase Restart Failure#

Diagnosing Poor Server Responsiveness#

Diagnosing Problems Related to Disk Usage#

Insufficient Disk Space#

Diagnosing Problems related to Memory Usage#

Diagnosing Excessive CPU Usage#

Diagnosing Problems related to Replication#

Diagnosing Problems related to Application and Query Execution#

Altibase Access Failure#

Nonresponsiveness or Session Disconnected due to Timeout#