The vCenter Database (VCDB) is a lean, mean, data collection machine and it’s what makes your vCenter YOUR vCenter. The VCDB stores things like events, statistics, licensing, and configuration information for your environment. Under normal conditions your VCDB should only be a few GB depending on the size of your environment, but occasionally we see some bloating. A little bloating is okay and in fact is normal, same as with people (especially around the holidays). But what happens when your VCDB becomes so gluttonous that it grows over 50GB or even 100GB? You’ll start to notice vCenter becoming extremely sluggish and can take a long time for your inventory to show up after a reboot. This is a scary sight at first…
In this example we’ll take a look at a VCDB that grew to nearly 200GB in vCenter 6! I’m going to show you how to troubleshoot this extreme growth and bring your VCDB back to a more manageable size.
Identifying VCDB bloating:
First of all, we need to identify the problem. In this example the vCenter’s OS disk has nearly filled up and we need to find out why. WinDirStat and TreeSize are excellent utilities for showing where all that free space has gone. Just make sure you run either utility as an administrator or else you may not be getting the full picture.
We’ve just identified the VCDB as the reason this vCenter’s OS disk is nearly full. This is why it’s recommended that you keep any SQL DB’s on their own volume or at least not on the OS volume.
Finding the bloat:
SQL Server Management Studio has a number of useful canned reports that we can use to identify what’s causing the VCDB to grow so large. First, let’s take a look at the database’s disk usage.
Open SQL Server Management Studio and connect to your VCDB instance. The server name will typically be VC1\VCENTER where VC1 is the name of your vCenter server.
If your SQL Instance isn’t configured for Windows Authentication, try SQL Authentication using your vpxuser credentials that you set when you installed vCenter.
Expand your instance and the Databases folder. Right click on your VCDB and select the Disk Usage report.
This report basically confirms what we saw in WinDirStat. There are only 2.2GB of log files (VCDB.ldf) and the database is 193GB!
Let’s take a look at the tables to see which ones are the largest. Right click your VCDB and run the Disk Usage by Top Tables report.
Wow!! 280,600,000 records in the events table!! Better take a look at what’s being logged. Before you decide to get fancy and query the VPX_EVENT_ARG table, just remember that according to William Lam a kitten dies every time you query the VCDB! William is a smart guy so we’d better listen to him! Seriously, there’s no reason to query the table. The events are stored in XML and there are much more friendly interfaces such as the vSphere client or PowerCLI (using the Get-VIEvent command) to get this information.
Looking at the events in the vSphere client, there appears to be an issue with one of our hosts which is causing dozens of events to be logged every second.
Thankfully, this issue was resolved with a simple host reboot.
Looking back at our disk usage by table report, we see a lot of statistical information being stored as well. Let’s take a look at our logging levels…
This vCenter is configured for Level 3 statistics logging. Look at the estimated space required!
That’s more like it! Level 1 is the default statistics logging level and requires much less space.
Cleaning up the VCDB:
We have two options now that the offending host has been dealt with… We could leave the VCDB alone and let it purge itself over time, or we could clean it up. Since you’re not reading this to learn to do nothing, let’s clean it up!
WARNING! Before performing ANY maintenance on the VCDB, take a FULL BACKUP of the database!!
Cleaning up the events table:
Configure vCenter’s task and event retention to one day. You can either configure this in the vSphere client…
Or follow the instructions in KB1025914 to edit the event.maxAge and task.maxAge values.
Note: If you haven’t already stopped the vCenter Server services, do so now.
From here, the easiest thing to do is to wait for a day and let the VCDB purge the events on its own. However, we’ll proceed to follow the steps in the aforementioned KB to clean things up now.
In your SQL Server Management Studio console, navigate to Databases -> VCDB -> Programmability -> Stored Procedures. Right click on dbo.cleanup_events_tasks_proc and click Execute Stored Procedure.
Hold tight! This may take some time to complete. It took an hour to run for this example. This would be a great time to go grab a coffee!
After the stored procedure has been executed, let’s see what it did by running our reports again.
Not too shabby. We freed up about 50GB or 26% of the overall DB.
We also only cleaned up 77 million event records. That’s a lot but we still have 204 million records remaining. Don’t fret. Remember, we only cleaned out all but one day’s records. Since we just corrected our issue it will take another 24 hours to purge the remaining data.
Cleaning up the historical statistics:
This is optional as it will delete your historical statistics. Unlike the tasks and events, statistical data will take longer to purge on its own because of the retention rules but we’ll speed things up here. KB1007453 tells us how to clean up the stats tables, however it’s a bit cumbersome so I’ll show you how to put SQL to work for you!
We need to truncate the statistics tables. This was simpler back in the 5.0 days, but in 5.1 VMware added hundreds of tables that need to be truncated, depending on how much data is being stored. The KB implies each table should be truncated one by one, but there’s an easier way.
In SQL Server Management Studio, right click VCDB and select New Query. Paste the following into the query and hit Execute (F5).
This query will truncate all the VPX_HIST_STAT tables, which in this example there are 545!
DECLARE @intTbl INT DECLARE @intHSNum INT DECLARE @intTblqty INT DECLARE @tablename varchar(15) DECLARE @trunctable varchar(100) SET @intHSNum = 1 TRUNCATE TABLE VPX_SAMPLE_TIME1 TRUNCATE TABLE VPX_SAMPLE_TIME2 TRUNCATE TABLE VPX_SAMPLE_TIME3 TRUNCATE TABLE VPX_SAMPLE_TIME4 WHILE (@intHSNum <5) BEGIN SET @intTbl = 1 SET @tablename = 'VPX_HIST_STAT' + CAST(@intHSNum AS VARCHAR) + '%' SET @intTblqty = (SELECT COUNT(*) from information_schema.TABLES WHERE TABLE_NAME LIKE @tablename WHILE (@intTbl <@intTblqtySET ) BEGIN SET @trunctable = 'TRUNCATE TABLE VPX_HIST_STAT' +CAST(@intHSNum AS VARCHAR) + '_' + CAST(@intTbl AS VARCHAR) PRINT 'Truncating ' + @tablename EXEC (@trunktable) SET @intTbl = @intTbl + 1 END SET @intHSNum = @intHSNum + 1 END GO
We’ve now freed up another 10GB!
The last step in the KB is to run the rollup jobs. Right click Past Day stats rollup and click Start Job at Step. Now do the same for the month, and week rollups.
After running the rollup jobs, let vCenter sit overnight to purge the remaining events and tasks. We’ve freed up a lot of space already so there’s no reason to rush things at this point. Don’t forget to start the vCenter Server services! Your backup admin will thank you!
Now for the moment of truth…
Take a look at all that beautiful whitespace!!!
From here you can decide whether you want to shrink the VCDB or not. Shrinking causes fragmentation, but I really don’t want to leave the VCDB holding on to 200GB if it doesn’t need it. To shrink the database, right click the VCDB, select Tasks, Shrink, Database.
The database shrink can take some time to execute. You can shrink the database while the vCenter Server Services are running, but it’s not a bad idea to stop them if you can afford to.
Once the shrink is done, so are we. Take a look at the database size now! Wow!
If you stopped the vCenter Server services, go ahead and start them again. You’ll notice now that the inventory populates much faster now! Don’t forget to set your task and event retention back to their original values.
–Corrected the SQL script:
DECLARE @intTbl INT
DECLARE @intHSNum INT
DECLARE @intTblqty INT
DECLARE @tablename varchar(15)
DECLARE @trunctable varchar(100)
SET @intHSNum = 1
TRUNCATE TABLE VPX_SAMPLE_TIME1
TRUNCATE TABLE VPX_SAMPLE_TIME2
TRUNCATE TABLE VPX_SAMPLE_TIME3
TRUNCATE TABLE VPX_SAMPLE_TIME4
WHILE (@intHSNum <5)
BEGIN
SET @intTbl = 1
SET @tablename = 'VPX_HIST_STAT' + CAST(@intHSNum AS VARCHAR) + '%'
SET @intTblqty = (SELECT COUNT(*) from information_schema.TABLES WHERE TABLE_NAME LIKE @tablename)
WHILE (@intTbl <@intTblqty )
BEGIN
SET @trunctable = 'TRUNCATE TABLE VPX_HIST_STAT' +CAST(@intHSNum AS VARCHAR) + '_' + CAST(@intTbl AS VARCHAR)
PRINT 'Truncating ' + @tablename
EXEC (@trunctable)
SET @intTbl = @intTbl + 1
END
SET @intHSNum = @intHSNum + 1
END
GO
Great article and also a +1 for above edit, added a ) to the ‘SET @intTblqty’ line.
This article saved my arse. Thank you!
Thank you for this great article, it saved me lots of time. We have a 2 hosts ESXi (essential plus) environment and it uses SQL Express (under 5 hosts / 50 vm’s) and it locks up at 10GB. Offcourse it happend to me while i was on holiday, the dBase gets full.
With this article it saved me some time and i’ve managed to decrease my dBase from 10GB to 3,7GB.