In a RAC environment of one of my customers within RAC 2 nodes 22.214.171.124, the file system with Oracle grid binary grow up, and after diagnostic I found that the file of RAC Cluster Health Monitor is big like 74Go.
Table of Contents:
- What is the problem?
1. What is the problem:
While checking up what files, I find this :
$ ls -lh /u01/app/11.2.0/grid/crf/db/XXXXX-db01/crfclust.bdb
-rw-r----- 1 root root 74G 6 aoÃ»t 14:58 /u01/app/11.2.0/grid/crf/db/XXXXX-db01/crfclust.bdb
This file is belong to the RAC Cluster Health Monitor.
Note 1: The RAC Cluster Health Monitor detects and analyzes operating system and cluster resource-related degradation and failures, For more information.
Note 2: Check the CHM repository size:
$ oclumon manage -get repsize
CHM Repository Size = 1094795585
Note 3: The CHM repository size is 1094795585 seconds (12671 days) and this is belong to a Oracle Bug.
2. What is the solution:
Modify the retention repository of the CHM from 1094795585 seconds (12671 days) to 259200 seconds (three days).
Note: The value for RETENTION_TIME must be more than 3600 (one hour) and less than 259200 (three days). If you enlarge the CHM repository size, then you must ensure that there is local space available for the repository size you select on each node of the cluster, For more information.
Step 1: Stop the CHM resource
$ crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'XXXXX-db01'
CRS-2677: Stop of 'ora.crf' on 'XXXXX-db01' succeeded
Step 2: Remove the file CHM
# rm -rf /u01/app/11.2.0/grid/crf/db/XXXXX-db01/crfclust.bdb
Step 3: Resize the CHM repository to 3 days:
$ oclumon manage -repos resize 259200
XXXXX-db01 --> retention check successful
XXXXX-db02 --> retention check successful
New retention is 259200 and will use 4524595200 bytes of disk space
CRS-9115-Cluster Health Monitor repository size change completed on all nodes.
Step 4: Start the CHM resource:
$ crsctl start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'XXXXX-db01'
CRS-2676: Start of 'ora.crf' on 'XXXXX-db01' succeeded
Step 5: Check the size the CHM repository :
$ ll -h /u01/app/11.2.0/grid/crf/db/XXXXX-db01/crfclust.bdb
-rw-r----- 1 root root 5,2M 6 aoÃ»t 15:06 /u01/app/11.2.0/grid/crf/db/XXXXX-db01/crfclust.bdb
This article explains how to resize RAC Cluster Health Monitor repository.