Oracle Database Health Check up Checklist

November 12, 2016 by S4

Filed under Oracle

Last modified November 12, 2016

Database Health Check up Checklist

1. Basic Server Status

  • Basic Server related configuration collection
  • CPU Speed
  • Physical Memory, Swap size
  • Disk layouts
  • Log storage
  • Number of database running on the server & in what mode e. (Open or Mount) mode.

 

  • Database up/down
    • Whether the database shutdown / started up
    • Any abnormalities noted during the startup or shutdown i.e. any Oracle background process terminated while starting.
    • Check the database initialization parameter in the init<sid>.ora file.
    • Check the /etc/system file for allocation of semaphores & shared memory parameters?
  • Check the Listener process.
  • Check the respective alert/trace files

(command- tail –f alert<sid>.log) or by (grep “ORA-“ alert<sid>.log >> oracle_errors.txt|tail –f oracle_error.txt or mail the oracle_errors to the concerned monitoring authority.

Please Note :- Alert log files needs to be monitored continuously and not only after periodical interval. This would help us in solving the errors promptly.

  • Check backup daemons checked, if any.
  • Any error messages observed in the alert files?
  • If so what follow-up action taken for rectification of the errors.
  • Generate Database map. Table structure
  • Generate scripts for creating Control file output in text format. ( SQL> alter database backup controlfile to trace. Check the trace file generated in the “$ORACLE_HOME/admin/<instance name>/udump” directory )

2. CPU utilization & IO stats checks

  • Review the core files through use of “strings” or “dbx” command
    • List the actions taken
  • Has the CPU been utilized more than 90%
  • If so, What were the processes running during that time.
  • Follow up action taken for rectification
  • How many sessions are active for loading process?
    • Concurrent users connected.
  • I/O statistics ratios (# iostat –xtcn 10 10)
  • Virtual Memory Statistics, Disk IO (Amount of Swapping)

(vmstat 10 10). (If OS is IBM then  use $ topas).

  • Executed the “fsck” command to check the OS level block check, if recorded. (Please note: – This would result downtime in terms of database & application shutdown).

 

Archive file generation status (Applicable if database is running in Archivelog mode)

  • Archive file status
  • How many archive files generated

from _____  to ____ time.

  • Start / End archive number

Size of transaction recorded from _____ to _____ time.

  • Interval of archival file generation
  • Percentage of archive destinations

3.

Database Space Management

  • Archive log files deleted or not?
  • Have you observed any abnormal disk growth?
  • Any follow up actions taken by way of intimating users / deleting unwanted files?
  • Has any volume touched 100% disk space of all file systems? Reason for the same?
  • Recurrence of such instance in a particular volume needs to be intimated to the concerned parties.
  • Are the trace files moved to secondary device after monitoring the same?
  • Are the core & log files moved to secondary device
  • Truncate alert<sid>.log (If its size is high.) (Need to shutdown the database. Move the file to other device & Start the database)
  • Truncate listener.log (Need to stop the listener. Move the file to other device & Start the listener)

 

Tablespace / Datafiles / Redolog files availability

  • Is the tablespace / datafile report generated (By querying dba_data_files & dba_tablespaces)
  • Check the free / utilized space available in the datafile associated with respective tablespace. (If requires resizing or addition of files check the OS level space availability)
  • Is there any abnormal growth of space seen in Tablespaces?
  • Which are the tablespaces having space utilisation more than 80%?
  • Are the tablespaces coalesced after relevant interval week / month or during deletion of massive data?
  • Are indexes rebuilded at regular intervals or after deletion of massive data?
  • Analyze for any existence of temporary tables, indexes and stored procedures in the tablespaces? (Check the same by querying “tab” of the respective schemas).
  • Check the status of redo log files members & log groups. (Query the v$log file & check the status (If the status is “stale” execute

     SQL > alter system switch logfile.

    If the status is “invalid”, this indicates the file is not accessible for write or read operation & need to be logically dropped & recreated by specifying the logfile sizes.

Ensure the status is not “current” while dropping the log file member.

 

4. Schema / Database changes

  • Whether any changes made to the database like addition / deletion of data / redo /control files?
  • Latest control file structure (Hard & soft copy to be maintained.)
  • Proper naming convention followed for creating existing / new file structure.
  • Additions done in the database structure captured in in the backup list / script.
  • Any resizing of datafiles?
  • Are all rollback/undo segments available online?

5. Unauthorized Logins

  • Have you checked the /var/adm/messages file for failed logins (“su”)
  • Has the /var/adm/sulog file has been checked for failed logins
  • Has the terminal from which the attempted logins has originated, been identified.
  • Is the database complying with security policy?
  • Check the password enquiry (based on security policy)
  • Change “root” password.
  • Change “oracle” password.

 

 Disk / Server Errors

  • Have you checked the “/var/adm/messages” file for any OS / Hardware related errors.
  • Any OS related core files generated oftenly.
  • Generate the Explorer output.

6.

Database security checks & Object level Permissions

  • Has any user been granted new permissions on any schema during the day? Please specify the schema name and the grants provided.
  • Preserve “system” & “sys” password in sealed envelope.
    • Check “execute” permission for “svrmgrl”, “sqlplus”, “lsnrctl”, “exp”, “imp”, “agentctl” file.
  • If new permissions granted, has an appropriate approval being obtained in writing.
  • Are the permissions permanent or on temporary basis? If temporary, when is the corresponding revocation due? Please specify time or event.
  • Any new table/index created today? Why? Mention the column name?
  • Are all views, procedures & functions valid? (If using Oracle 8i onwards, use “utlrp.sql” in $ORACLE_HOME/rdbms/admin directory.
  • Are respective triggers enabled ?
  • “tnsnames.ora” should not have IP address (only hostname).
  • “listener.ora”, “tnsnames.ora” should be open for editing only for “oracle” or “dba” related users.
  • “/etc/hosts”,“/etc/system”, “/etc/passwd” files should have read, write accessible only to “root” or “root equivalent”. “oracle” owner should have only “read & execute”. Others should not have access to read the files.
  • Access to create temporary file in /oracle binary should be revoked from all users except the “Oracle owner”. Only “Oracle” owner should have write permissions.
  • Check the permissions of OS executable files permissions. (/usr/bin, /usr/ccs/bin, /usr/sbin)
  • Configuration of basic Oracle security related alerts done, if using Oracle Entriprise Manager (OEM).
  • Normal database users should not have “exp_full_database” & “imp_full_database”.
  • Passwords of “system” & “sys” users should not be written in any script file. If so, normal users should not have read permissions to those files.

Check permissions of the Application related files.

 

7.

BACKUPS CHECK

 

Backup export

  • Export dump taken
    • Has export dump taken? Give start time and end time.
    • Has the export dumps taken in “compress” mode?
    • Has the consistent parameter is set to “y” while taking export backup? Check for sufficient space & size of the online rollback segments before considering this parameter for “y”.
    • Mention the location of export dumps.
    • Has any export dump (compressed) taken to tape?
    • Any export dumps deleted after backup?
    • Whether adequate space made available for next export dump?
  • Monitor the export log for any errors.
  • Monitor backup logs generated by Backup Software.
    • Check for non-execution of backups.
    • Analyze the reasons for non-execution of backups.
    • Were any actions taken on the same and after rectifying the same resubmitted the backup etc.

Backup Archive

 

  • Have you checked for missing archive files? Ensure that no archive files are missed (*)
  • Mention the starting and ending archive file sequence number backed up
  • How many archive files are backed
  • Cross-check the above two information
  • Whether the details of archive backup updated in backup register.
  • Has the archive files moved to any other destination other than default destination or archbackup?

If so, mention the destination and the reasons

 

Cold Backup

 

  • Whether Cold backup taken?
  • Whether logs in /var/adm/messages verified for backup errors or any errors in “Veritas Net Backup”.
  • Whether the all the respective volumes are updated in the backup script with the volumes of datafiles / tempfiles / redo logfiles / controlfiles in oracle database. SQL> alter database backup controlfile to ‘<directory>/controlbkp.ctl’.

 

RMAN backups (Applicable for

  • Database is backed up using catalogue database or through control file.
  • If configured through catalogue database, please check whether the catalogue database is backed regularly.
  • What is the database retention policy?
  • Check the version of database of primary & catalogue database matching?

Updation of backup register

  • Has the backup register updated with relevant information?
  • Whether cold backup details updated in backup register?
  • Checked the validity of RMAN / hot backups and incorporate any problems / solution in the backup register?

8.

Corruption & Recovery checks

 

Corruption Checks

  • Are all datafiles available online? (Query the v$datafile_header & v$datafile views). Check for any status with recover or offline status.
  • Are there any ORA-0600 & 07445 errors reported? Any core dump generated due to flash of these errors, as these could be serious pointers towards database corruption or misbehaving of application code or serious memory leakages.
  • Any ORA-4031 error reported for memory leakages (while connected)?
  • Are the archive files taken to check logfiles corruption today?
    • Alter system dump logfile ‘____.arc’;
  • Is “dbv” command fired to check datafile block corruption?
  • Any event captured in the init<sid>.ora file to keep track of the process termination.

 

For Data Header Block Corruption Check & Rectification execute the following commands.

 

SQL> analyze table <table name> validate structure cascade;

SQL> analyze index <index name> validate structure cascade;

(Ensure no process in progress on the specified table / index checked for validation)

 

Recovery verification

  • Are any datafile restored? If so, reason for the same.
  • Control file recreated ?
  • Entire database recovery done?
  • Tablespace recovery?
  • Tables recovered?

9.

Performance Rating Parameters

 

SGA Related Parameters

  • Buffer cache hit ratio (Should be always > 95%)
  • Library cache ratio (Should be always > 85%)
  • Data dictionary hit ratio (should be always > 90%)
  • Calculate for row hit ratio (Query v$rowcache view)
  • Top 10 SQL statements utilization.
  • Load on database writer process per CPU. (Should be in sync with the number of CPUs installed in the machine)
  • Check the sort usage per processes. (i.e. Count of Disk sorts vis-à-vis Memory sorts)
  • Check for waits generated by background processes. (Query the v$bgprocess)
  • Check for redo log space requests waits

 

Chaning & Fragmentation check

  • Check for existance of chained rows in tables? (SQL> analyze table <table_name> list chained rows; Further, query the “chained_rows” table for existance of chained rows. If so, needs to take an export of the chained table (compress=n), drop the same, import the table, Validate the associated functions, procedures, triggers)
  • Alternatively, Check the query time vis-à-vis output of no. of records. If query time is more than have should be then check for fragmentation. If fragmentation exists,
  • Analyze the table dependencies
  • Export the table
  • Drop the table
  • Import the table
  • Validate all the invalid procedures, functions, triggers
  • Analyze the storage based parameters analyzed i.e. (pctfree, pctused, freelist, freelist groups)
  • Analyze the block level parameters (inittrans,maxtrans) parameters based on the concurrent access of the tables by the users.
  • Check for high water mark (dba_extents, dba_segments)

 

 

Locks & Latch contention

  • Locks, waits for process per table. (Check the v$lock)
  • Check for occurrences of deadlock? Check the query in the trace files
  • Check for latch contention, by querying (v$waitstat & v$latch) view.
  • Check the type of latch waits.
  • Check for “free buffer busy waits”, “redo allocation wait”, “redo copy waits”.
  • Check the spin counts. (Should be changed based on the CPU load)

 

Analysis of SQL Statements

  • Any of database objects kept in permanent cache.
  • Are the “dbms” packages kept in cache as mentioned in Annexure 1.
  • Are indexes used optimally on these queries (Check the cost of full table scan vis-à-vis index table scan)
  • Are any optimizer based parameters viz. optimizer_index_cost_adj set in the init<sid>.ora file (Supported from 8.04 onwards).
  • Execution time analysation of “SQL statements” (Check in the memory parameters i.e. persistant , runtime memory in v$sql & v$sqlarea views)
  • Existance of Rollback Contention (Details in v$rollstat. Check the number of shrinks, wraps in the rollback segment.
  • Check disk I/O contention (Details in v$filestat, v$waitstat)
  • Check log switch occurrences in the “v$log_history” (Should vary from 5 minutes to 1 hour. Should not be less than 5 minutes neither more than 1 hour)
  • Check for incomplete “Checkpoint” errors in the “alert<sid>.log”.
  • Compute table statistics & index statistics at regular intervals or after massive deletion of data?
  • Take a Count of segments for pctincrease parameter having > 0. (Query the dba_segments table)
  • Check the sort usage for the respective queries (by executing the following query v$sort & v$sort_usage)

Leave a Comment