Operational vROPs Part 4: Tuning Alerts with Symptoms

In Part 3 of this series, we covered the use of separate Policies as a method of suppressing Alerts in a reasonably targeted method. But as was mentioned in that article, two potential use cases were not addressed with that method:

  1. We only used the method to suppress alerts, rather than tune Alerts.
  2. We conceded that the method described in Part 3 did not offer a practical solution to granular tuning of Alerts.

So, consider the previous article as a basis for the next level of Alert tuning: Editing Alert Symptoms.

This article will explore two use cases for tuning Alerts by editing Symptoms:

  • Threshold Tuning
  • Selective Alert Suppression using Property Symptoms

Before we begin, I recommend users who are not familiar with vROPs Alert structures review the critical relationship between Alerts and Symptoms (they are NOT the same thing). See Chapter 4 of the vRealize Operations Manager Customization and Administration Guide for details.

But first, let’s review some more real-world examples.

Real World Example of the Need for Symptom Tuning

In the previous post, we cited two common examples for Alerts that operators may wish to suppress under certain conditions:

  • Stress Alerts
  • Disk Space Alerts (for disks known to be at or near full capacity by design)

These can also be examples of alerts requiring symptom tuning, or – as we will see in this article – more granular suppression by the use of symptoms.

But let’s consider another example from the real world: Several weeks ago, I had been paying a series of scheduled visits to a Health Care customer in Illinois, discussing many of these topics. Between two of those visits – as happens too often – the customer had a serious outage in the overnight hours.

The root cause of this outage was an “oldie but goodie”: SCSI conflicts between LUNs in their SAN. This caused intermittent and sporadic availability issues for VMs being able to access storage.

This customer was only operating vROPs with the vSphere Management Pack installed, which would NOT detect the root cause of this alert. The “smoking gun” in this situation would have been in ESXi logs, (which is a good illustration of how vROPs and vRealize Log Insight could have worked together to quicken the resolution of a serious problem.)

Sample SCSI Reservation Error
Figure 1: Typical SCSI Reservation Error from ESXi 5.x

The customer understood the explanation of the vROPs vSphere Management Pack, but asked a very valid question: Even if vROPs – as configured – would not have detected the root cause, shouldn’t the out-of-the-box symptoms (ie “effect”) have raised an alert? In this case shouldn’t vROPs have detected out of the ordinary I/O latency for the affected resources?

This was an excellent question, and it caused us to engage in a bit of sleuthing. We reviewed the data within vROPs during the relevant time frame, and sure enough, in both the Alerts View and the Timeline View, we saw several latency Alerts from vROPs for objects affected by this outage.

But this only illustrated a deeper issue: With the out-of-the box vROPs settings and most user environments, I/O latency Alerts (for Read and Write) are very common. This is because the out of the box Policies in the vSphere Management Pack and the pre-configured Alerts are based on vSphere Best Practices, which prescribe a tolerance for Latency up to 20-30 milliseconds. The fact is, this condition is not uncommon in customer environments. (See Figure 2 for typical example).

vROPs Common Latency Alert
Figure 2: vROPs Typical Latency Alert

So, this alert criteria DID result in Alerts related to the SCSI outage, but the Alerts were lost in a sea of other similar Alerts. What was different about the latencies caused by the SCSI issue was the scale of the latency: in this case up to and beyond 1500 milliseconds.

Most virtualization and storage admins would agree there is a huge difference between latency issues of 20-30 ms (barely above tolerable) and latency up to 500, 1000, or 1500 ms.

NOTE: There is an argument in this example for the value of the Anomalies analytics in vROPs. Over time, even latencies of 20-30 ms that trigger hard threshold alerts would eventually be marked as “Normal” by vROPs Analytics. The Anomalies badge would then fire when truly abnormal latencies would be observed. This, too, was noted in this example, but the customer had not yet incorporated Anomaly behaviors into their troubleshooting efforts. See Figure 3 for a similar — albeit much less severe — example.

IO Anomalies for Datastore
Figure 3: Mild I/O Anomalies for a vSphere Datastore

Anomalies aside, this experience illustrated a need for this customer to adjust vROPs for an entirely different kind of Alert notification than what was configured at installation. In my recommendation, I suggested a new variant of the Latency Alerts, called “Super Critical Disk Latency” that would only fire when latency values rose significantly above the 20-30ms layer that happens all too often. I arbitrarily recommended a threshold for this alert at about 500ms. Such occurrences should be very rare, but would be very critical in almost any instance they occurred.

So in addition to the suppression examples from our previous article, we will use the above example as our template for threshold tuning:

Use Case 1: Threshold Tuning in Symptoms

Before you begin, watch the following video which provides a good overview of the basics of vROPs Alerts and Symptoms.

vROPs Alert and Symptoms Overview Video (Duration: 5 Minutes)

We will use the Latency issue above as an example for Hard Threshold tuning in Alert Symptoms. To note, we will NOT create this alert from scratch. We will use the existing Latency Alert definition as the basis for the newly-tuned “Super Critical Latency” Alert definition.

  • First Task: Copy and Edit the New Alert Definition in the Alerts Section
  • Second Task: Add the New Alert Definition to the Appropriate Policies

First Task: Copy and Edit New Alert Definition

In this task we will use the Out of the Box Alert for Latency as the basis for an entirely new alert that will detect and notify for much more severe latency conditions:

1. Make sure you are logged into vROPs as an Admin user.

2. Click on the Content Quicklink at the top of the Navigation Pane:

Figure 4: Content Quicklink
Figure 4: Content Quicklink

b. If you reviewed the brief video in the above link, you will know that Alerts are made up of one or more symptoms. In our use case — dealing with Latency — we wish to use an existing symptom as the basis for a new — more critical — performance Symptom. So that is where we will start.

3. Click on the “Symptom Definitions” Section

a. This will take you to the Library of All Existing Symptoms in the vROPs Alert Library. Note the categories (Adapter Type and Object Type). These categorizations will help you understand the scope of the Symptoms you create or edit. This is important, since there are similar alerts for different resource types. In this case, we want to capture severe Latency for Virtual Machines.

b. There are several Symptom types. For this case, make sure you have selected “Metric/Supermetric Symptom Definitions.”

Figure 5: VM Latency Symptom Definitions
Figure 5: VM Latency Symptom Definitions

4. To find Latency Symptoms, type “latency” into the Symptoms filter.

a. You will likely see several Symptom definitions related to Latency. You may need to scroll through the examples, to find the two that are related to Virtual Machines. (One each for Read Latency and Write Latency — See Figure 5).

5. Click on the Symptom: “Virtual Machine has high read latency” to select it.

6. Click on the “Clone” IconClone Item Icon at the top of the screen to create a copy of the Symptom.

7. This will open the Symptom Editor.

Figure 6: Symptom Editor for Read Latency Symptom
Figure 6: Symptom Editor for Read Latency Symptom

a. For this example, we will change three items in the Symptom:

  • Description (The new symptom should be self-explanatory)
  • Severity (The level of Alert Severity is much higher than the source Alert)
  • Threshold (The metric condition must be higher to match the level of Severity)

b. Configure the following new values for this new Symptom:

  • Description: “Virtual Machine has HIGHLY CRITICAL read latency”
  • Severity: Critical (Changed from Warning)
  • Threshold: 500 (Changed from 15)

c. Note that the metric for the symptom (Virtual Machine: Aggregate of all instances|Read Latency (ms)) does not need to be specified, because it was already chosen as part of the cloned Symptom. When you create symptoms from scratch, the metric you select can be any metric collected by vROPs.

d. The value of the Threshold (500) is a bit arbitrary. The goal is to set it high enough that when the Symptom and Alert fire, it is high enough to warrant immediate action and not be dismissed as a “false positive” , and not too high as to miss actual conditions that would be deemed as critical. Keep in mind, in this exercise, we are not deleting the original Warning Alert that fires for Latency conditions of 15 ms. The ultimate value of the threshold is a matter of personal judgement.

e. Do not change the Wait Cycle or the Cancel Cycle values at this time.

Edited Critical Read Latency Symptom
Figure 7: Completed New Symptom for CRITICALLY HIGH Read Latency

8. Click the “Save” Button to save this symptom.

9. Repeat Steps 1-8, but in this instance, clone the Symptom: “Virtual Machine has high write latency”, and configure the cloned Symptom with the values below, so it appears in Figure 8:

  • Description: “Virtual Machine has HIGHLY CRITICAL write latency”
  • Severity: Critical (Changed from Warning)
  • Threshold: 500 (Changed from 15)
Edited Critically High Write Latency Symptom
Figure 8: Completed New Symptom for CRITICALLY HIGH Write Latency

Now that we have our Symptoms, its time to package them into new Alerts. Before we proceed, let’s first review the basic parts of Alerts:

  • Symptoms
  • Impact
  • Recommendations
  • Actions

We have just completed our new symptoms, and we will see in a moment, we will also make some changes in our new Alerts as well:

To create our first new Alert (for CRITICALLY HIGH Write Latency), complete the following steps:

1. Make sure you are logged in to vROPs as an Admin User.

2. Navigate to the “Content” Quicklink.

Figure 4: Content Quicklink
Figure 9: Content Quicklink

3. Click on “Alert Definitions”.

a. This will bring us to the Library of existing Alerts.

b Just as we already had latency Symptoms we could clone as the basis for our new Symptoms, we also have existing latency Alerts we can use for the same purpose.

4. In the Alert Definitions window, type “latency” into the filter to find existing Alert definitions for latency.

Alert Definitions filtered to find Latency Alerts
Figure 10: Alert Definitions filtered to display latency Alerts

a. Note the two existing Disk I/O latency alerts for Virtual Machines. These are based on the original symptoms with the lower latency thresholds that we cloned earlier. We will clone these Alerts using a similar method.

5. Select the Alert “Virtual Machine has Disk I/O read latency problem”.

6. Click on the “Clone” Clone Item Iconicon at the top of the screen to create a copy of the Alert.

a. You should note that the Alert Definition Workspace has 5 steps. We will only need to change a few of them to create our new alert. See Figure 11:

Cloned Read Latency ALERT
Figure 11: Cloned Read Latency Alert (Unedited)

7. Make sure you have selected “Step 1: Name and Description” of the Alert Definition Workspace. Edit the Name of the Alert to read “Virtual Machine has CRITICAL Disk I/O read latency problem”.

a. OPTIONAL: Edit the Description of the Alert to read something similar.

8. Click on “Step 4: Add Symptom Definitions” of the Alert Definition Workspace.

a. Note the existing Symptoms of this Alert. As you can see, this is an example of an Alert that requires more than one symptom to fire the Alert. There is an existing read latency symptom we will replace. But there are additional symptoms for Co-Stop and CPU swap wait. For the purposes of this exercise, it is up to you whether you would prefer to leave these as criteria for our new alert, or remove them, leaving only our new symptom.

9. Click on the “X” next to the Symptom called “Virtual Machine has high read latency” to remove it from the Alert Definition.

10. In the list of “Symptom Definitions” at the left of the Alert Definition Workspace, type the word “CRITICALLY” to help find the new symptoms we created in the previous task.

11. When you find the Symptom “Virtual Machine has CRITICALLY HIGH read latency”, drag it to the “Symptoms” area of the Workspace.

12. Click on “Step 5: Add Recommendations” of the Alert Definition Workspace.

a. You will note that the original Alert Definition contains several possible recommendations for remediation of this condition. Since our new, more critical Alert is more severe, we may wish to add an additional recommendation to this Alert Definition. This is where your own IT Operational Procedures can help. You and your staff may have procedures in place to address issues of this type. This is your opportunity to add your own best practices to an Alert.

13. Click the plus sign “+” to add a new Recommendation.

14. For my example, I simply added a Recommendation to “NOTIFY THE SAN RAPID RESPONSE TEAM IMMEDIATELY” to the Alert. You can add something similar or something else more in line with your own procedures.

a. When complete, your Alert Definition should look something similar to Figure 12 (changes noted in red):

Modified Critical Read Latency Alert
Figure 12: Cloned and Edited CRITICAL read latency Alert

15. Click the “Save” button to save the Alert.

So, now, we have our CRITICAL Read Latency Alert. You also can create your new Write latency alert, by repeating Steps 1 through 15, to clone the existing Alert “Virtual Machine has Disk I/O write latency problem”. Edit it to appear similar to Figure 13:

Edited Critical Write Latency Alert
Figure 13: Cloned and Edited CRITICAL write latency Alert

FINAL STEP: Putting the New Alerts into Action.

Alerts do not become active until they are added to a Monitoring Policy and that Policy is assigned to the Groups for which you want the Alert to apply. We covered the basics of enabling and disabling Alerts within Policies in Part 3 of this series. Review those steps and make sure these two new Alerts are enabled for the resources for which you want them to apply. See Figure 14 for an example of what this should look like:

New Alerts Enabled in Policy
Figure 14: Newly created Alerts are ENABLED in correct Monitoring Policy

Testing the New Alert

If you wish to verify the appropriate behavior of these new Alerts, you should be able to do so using a tool to generate I/O behavior such as IOMETER.

Selective Suppression of Alerts Using Property Symptoms

So, now we know how to tune Alerts using Symptoms and adjusting thresholds. The next technique is to build on the Alert suppression method we described in Part 3 of this series, but doing so in a much more granular fashion.

In the next series of steps, we will demonstrate how to suppress specific Alerts on a resource by resource basis. We will do this using two key pieces of data:

  • A specific feature of vROPs Alerts known as Property Symptoms
  • vSphere Tags in vCenter

As we just demonstrated, Alerts in vROPs are based upon one or more Symptoms. Most of those symptoms are based upon performance metrics. But those are not the only kinds of symptoms that vROPs Alerts can recognize. In fact there are several additional symptom categories. Here is the complete list:

  • Metric/Supermetric Symptoms
  • Property Symptoms
  • Message Event Symptoms
  • Fault Symptoms
  • Metric Event Symptoms

For the purposes of this exercise, the key Symptom type is Property Symptoms.

Our use case here is to manage Alerts by exception. We want to have a simple, straightforward method to view common alerts as they occur, and based on our needs, exempt or allow specific resources (VMs, Hosts, Datastores, etc) to continue or stop receiving events of this type.

For this example. we will return to our Stress Alerts we covered in Part 3 of this series.

Instead of suppressing Stress alerts for entire Clusters or other Groups of systems as we demonstrated in that article, we may wish to make realtime decisions to suppress the Stress Alert for individual VMs.

This is where Property Symptoms can help us. We can add some simple logic to the Stress Alerts to suppress it by specific configurations found within the VM.

So, let’s take a look by making this simple change:

1. Make sure you are logged into vROPs as an Admin user.

2. In the Navigation Pane, click on the “Content” Quicklink.

Figure 15: Content Quicklink
Figure 15: Content Quicklink

3. Click on the “Alerts Definitions” link. You will be returned to the existing Alerts definitions.

4. We are going to make an edit to the Stress Alerts, so in the Symptoms Filter, type “stress”.

Alerts Definitions filtered to show Stress Alerts
FIgure 15: Stress Alerts

5. Select either of the Stress Alert definitions (CPU or Memory).

6. Click on the “Edit” Edit IconIcon

a. The Alert Definitions Workspace opens.

7. Click on “Step 4: Add Symptom Definitions”.

8. Note the Drop Down Menu “Symptom Definition Type”. From that menu, select “Property”.

Property Symptom in Drop Down Menu
Figure 16: Select “Property” from “Symptom Property Type” drop-down menu

9. Click on the plus signPlus Sign to Add New Item to create a new Symptom.

a. This will display all of the Property Types that vROPs collects for Virtual Machines from vCenter.

b. You will note there are many choices here: OS, VM Name, CPU and Memory configurations, etc. It might be our first instinct to choose VM Name as a method of exempting specific VMs from Alerts, but this would be inefficient from an administrative perspective. It would require you to visit the Alert editor for every exception.

10. The Property we are looking for here is “vSphere Tag”. Select “vSphere Tag” and the Symptom Criteria will appear. Complete the Criteria so it appears as it does in Figure 17:

Exempt from Stress Property
Figure 17: Property Symptom to Suppress Alert if specific tag found for VM in vSphere

a. So it is clear what we are trying to accomplish, let’s explain the logic in this symptom: We want to add a condition to an Alert so that it allows the alert if a specific tag is not found in vCenter. So by default, if you change nothing in vCenter, this Alert behaves as configured. If you add the tag “Exempt from Stress” to a VM in vCenter, however, any Alert with this Symptom will not fire for that VM.

11. Click Save, to save the Property Symptom.

12. Now we need to add the Property Symptom to the Stress Alert(s). The new Symptom should appear in your list of Property Symptoms. Simply drag it to the appropriate Symptoms area of the Stress Alert as shown in Figure 18.

Stress Alert with Property Symptom Added
Figure 18: New Property Symptom added to Stress Alert

13. Now, Save the Stress Alert Definition.

14. If you wish, you can repeat this for the other Stress Alert, or any other Alerts you would like to suppress using this vSphere Tag.

So, that’s it for configuration in vROPs.

To tag a Virtual Machine for Alert Suppression for this alert, you simply open the VM in the vSphere Web Client. The good news is that you can do that from within vROPs itself!!

To suppress Stress Alert for a specific VM:

1. Find the target VM in the vROPs UI.

2. From the Actions Menu, select “Open Virtual Machine in vSphere Client”.

Open VM in vSphere Client
Figure 19: Opening a VM in vCenter Directly from vROPs

3. Click on the “Manage”, then click on “Tags”.

4. Click on the “New Tag” Icon.

5. Add the Tag “Exempt from Stress” and a Category.

Add Tag to VM in vSphere Client
Figure 20: Tag added to vSphere that will exempt VM from Stress Alerts

6. That’s it. With this Tag applied to any VM, it will never receive Alerts with the Property Symptom assigned.

There are many different ways this can be applied. Of course, having a handful of tags already configured in vSphere can make this even easier. Then you can go into the vSphere Alerts, and add the appropriate symptoms to the Alerts you wish to suppress on an exception basis.

Advantages vs. Disadvantages of this Method

This method of Alert Suppression has several advantages:

  1. It requires NO CHANGES to Monitoring Policies or creation of Custom Groups. Although combing this method with those simply adds another level of administrative control.
  2. It is ideal for one-off exceptions.
  3. Once the preparation of Alerts (ie adding the appropriate Property Symptoms), this method is relatively easy and intuitive.
  4. If we are fortunate, we will soon have available tools so we can automate the application os of vSphere Tags to many resources at once.

There are only a few disadvantages:

  1. There is some administrative overhead at the beginning (adding the Property Symptoms to Alert you want to Suppress). However, you are not likely to need to apply this method to every Alert type; only those that appear commonly, but you don’t wish to suppress the alert type altogether.
  2. The method will not work for Datastores unless you are running vROPs 6.0.2 or later. Earlier versions of vROPs did not recognize vSphere tags for Datastores. Fortunately, this has been addressed in the 6.0.2 release.

Conclusion:

Hopefully, by now, you are building a powerful set of tools and methods to tune and configure vROPs to work most effectively in your environment. For most customers, the methods described in this series thus far. But we can take this even farther. So, stay tuned!

COMING UP NEXT: “Zero Alert Baseline” Using vROPs Property Inheritance

Operational vROPs Part 3: Basic Alert Tuning Using Policies

So far, in this series, we have been covering basic concepts and foundational administrative tasks to allow you to tune vROPs 6 to manage your environment to suit your business needs. In Part 2 of this series, we covered why creating Custom Groups is an essential first step to making vROPs an Operational tool.

In this article and the next several to come, we will begin the actual process of Alert Tuning, starting first with some very basic methods, then progressing incrementally to more advanced methods.

Before we proceed, let’s look at some common, real world examples of Alerts that appear in vROPs 6.x that may cause you to decide you need to start the process of Alert Tuning…

vROPS CUSTOMER: “WHY DO I KEEP SEEING THIS ALERT?”

I have been asked this question by dozens of customers. The answer is that a) the out-of-the-box settings are based on Global Best Practices, and b) Global Best Practices are not optimized for individual environments. In other words “one size fits none.”

Most alerts in vROPs 6.x will fire very rarely, if ever. This is, of course, by design. By definition, Alerts are meant to be notifications of something unusual that you should attend to in a given time frame. (See Part 1 of this Series for more about vROPs Alert Priorities and recommended response times).

But experience has shown us that certain alerts to appear more often than others. Frankly, depending on your organization’s needs and priorities certain alerts may appear too often.

Here are a couple of examples of specific Alert that are raised by vROPs where customers often want to tweak the Alert behavior or suppress it altogether.

  • “Virtual machine has chronic high CPU(or Memory) workload leading to CPU (or Memory) stress”
  • “Virtual Machine (or Datastore) is running out of disk space”

Before we discuss how to tune for these kinds of alerts, let’s use these examples to explore why we would want to tune each one.

Stress Alerts in The Risk Category

In my experiences with vROPs 6.x, few alert examples have raised more questions than this specific alert:

CPU Stress Alert
Figure 1: CPU Stress Alert for Virtual Machine

I will not go into a detailed definition of the Stress Algorithm in this article. For a detailed explanation of the Stress algorithm and Badge score, I recommend this video:

vRealize Operations Stress Badge Explained

Hopefully, if you have viewed the video, you agree that the Stress algorithm can be a very valuable analytical tool. But you still may not wish for it to raise alerts for your entire environment.

Also, remember the follow caveats about Stress alerts:

  • It is a Risk Alert, therefore classified as a Trending Issue, and has a less urgent priority.
  • It is not a point in time analysis. Unlike Workload and other utilization metrics found under the Health Alert category, Stress is calculated over time…not instantaneously.

The final point is why the Stress alert may require tuning or suppression. Most alerts, if they are classified by a user as “noise”, can simply be cancelled. But Stress is different. Stress is NOT a measure of utilization at the current instant of time, such as Workload. A Workload alert may be raised, and cancelled, and you may never see it again, unless the utilization of that VM repeats the same utilization spike.

Stress is a more persistent state for a VM. It is calculated over time and will only change over a similar span of time. So you may cancel a Stress Alert in the vROPs UI, only to see it immediately return, because the same Stress condition still exists.

So, while Stress analysis is a valuable tool, you may not need to be alerted on it all the time. So it is a good candidate for tuning.

Disk Space Alerts

Here is a typical Disk Space alert for the virtual disk on a VM. Similar Disk Space alerts may appear for Datastores.

Disk Space Alert in vROPs
Figure 2: Disk Space Alerts for Virtual Machine

Intuitively, most users would want to be alerted when physical disks are reaching full capacity. This is unless the disk space utilization is well-known, understood and/or by design.

It is not uncommon, in some use cases, for operators to intentionally fill disk storage at or near physical capacity, because they know there is little or no chance the used disk space will grow suddenly.

Unfortunately, without tuning, vROPs has no way of “knowing” about these use cases. But this is not an Alert most users should want to suppress entirely. For use cases where disk usage can grow and cause problems, the same alert definition can be very valuable. This is a classic example of the need for Alert tuning.

So, using these two examples — and a few more — in the next few articles, we will explore several Alert Suppression and Tuning Methods, starting with the simplest method: Using Separate Policies.

Managing Alerts With Policies

Since Alert Definitions are contained in Policies, the only way to alter them is through Policies. So if your basic task is to suppress or tune an alert, you must edit the Policy where that Alert is defined.

Detailed instructions regarding Policy editing are beyond the scope of this article. Refer to Chapter Four of the vRealize Operations Manager Customization and Administration Guide for details.

Here, we will simply use Policies, and Groups (which was covered in Part 2 of this series) to suppress the Stress and Disk Space alert examples discussed earlier for “Test and Dev” vs. “Production” environments.

I have already created Custom Groups in my lab for the two vSphere Cluster for which I intend to tune alerts.

  • SDDC Production
  • SDDC Test

In this example, my desired outcome is to suppress both the Disk Space and Stress alerts for the Cluster/Group “SDDC Test” and to allow both alert types for the Cluster/Group “SDDC Production”. You may have alternative needs for alerting in Test and Production environments, but that is not the point of this exercise. We are simply demonstrating a method.

Step One: Creating Separate Policies for Production and Test

These steps are also documented elsewhere, but since they are simple, we will walk through them briefly here. Initially, both of my Clusters are assigned the same policy: The Default Policy that is configured out-of-the-box when vROPs is installed.

It is not recommended that you alter any of the pre-configured policies that ship with vROPs. Instead, you should clone them as starting points and edit the cloned policies to suit your needs. Since we anticipate the future need to tune alerts for both Production and Test, we will clone the Default Policy once for each Cluster/Group. In the end, neither will be managed by the Default Policy, but a new customized policy for each environment.*

* At the end of this article we will show a slight variation of this method.

Creating New Child Policies, Based on Default Policy Settings

To “clone” the Default Policy for the SDDC Test Cluster/Group, we will execute the following steps:

1. Make sure you are logged in to the vROPs UI as an Admin user.

2. Navigate to the Policy Editor in the vROPs Navigation Pane as depicted in the figure below.

a. Click on the Administration Quick Link

b. Click on the Policies Option

Navigate to Policy Editor in VROPs
Figure 3: Navigate to Policy Editor

3. The Policy Management Screen has two sections. A window with Active Policies, and a “Policy Library” section. Click on the “Policy Library” tab. Click on the (+) Sign to expand the “Base Settings” to view the Policy hierarchy. Policy inheritance is beyond the scope of this article. Just know that by using an existing Policy as your starting point, you are not starting from scratch. So for now, we will focus on a 1st. Level Policy: The “Default” policy we need to “clone”.

Policy Library
Figure 4: The Policy Library

4. Click on the Green (+) Sign at the Top of the Policy Library (next to the Pencil Icon) to Create a New Policy.

a. Again, we will NOT be creating a policy from scratch. Instead, we will be creating two new policies, based on the Default Policy.

b. The Policy Editor will Appear as depicted in Figure 5.

Policy Editor Step One
Figure 5: Blank Policy Editor

5. This will be our “Production Policy”. So in the Name field, type “SDDC Production”

6. Optional: Type a Description for your Production Policy

7. IMPORTANT: Under the Label “Start With”, there is a drop-down Menu. Open that Menu and Select the “Default Policy” as the basis for this New Policy in effect “cloning” the Default Policy. (See Figure 6)

Policy Name and Base Policy Selected
Figure 6: Production Policy named and will be based on Default Policy

8.  Since you are not suppressing any Alerts in Production at this time, you will not need to edit any Policy details at this time. You will make some edits in the next Policy we create for Test. But do not Save the Policy yet!!! It is not yet assigned to the correct Group.

9. To Assign this Policy to the right Group, skip ahead to Step 6: “Assign Policy to Group”. In my case, I am using a Group called “SDDC Production”. In your environment use the Custom Group that is most appropriate. (See Figure 7)

Assign Policy to Production Group
Figure 7: Policy is assigned to SDDC Production Group/Cluster in Final Step of Editing.

10. Click the “Save” Button to Save the Policy.

Now, I have a brand new Policy assigned to the Production Group named “SDDC Production”. But since the Policy I assigned was based on the Default Policy and I made no further changes, I have made no effective changes to how my Production Group is managed. I am simply prepared to do so in the future, but without affecting other objects that are not part of the “Production” Cluster/Group.

We will now complete the same steps for our “Test” Cluster/Group, but will actually affect some changes to the new Policy to alter Alert behavior.

Creating the SDDC Test Policy and Suppressing Stress and Disk Space Alerts

1. We will repeat steps 1-7 from the above procedure, but with the following differences:

2. The initial screen in the Policy Editor for the Test Policy will look similar to Figure 8 below.

SDDC Test Policy Screen One
FIgure 8: Policy Editor Screen to Name Test Policy. Also based on Default Policy

3.  After naming the Policy, you are finally ready to edit Alert Settings. This is done in Step 5 of the Policy Editor. Click on Step 5 to view the Inherited Alert Settings from the Default Policy. (See Figure 9)

Policy Editor Inherited Alerts
Figure 9: Inherited Alert Settings in Policy Editor

4. Finally, we are where we want to be: tweaking Alert settings. First, this screen requires a bit of explanation:

a. For the purposes of this exercise, we will ignore Symptoms (depicted in the bottom half of Figure 9). We will get into Symptom changes in a later article in this series. For now, we simply want to get an understanding of Enabling or Disabling specific Alerts.

b. The upper pane of Figure 9 shows us all of the Alert Settings available in this Policy and their Enabled/Disabled state. Alerts that are Enabled have a Check box in the “State” Column. Most of the Alerts you will see in this example are Enabled.

c. Also in the State column, we see the source of the Alert State. There are two possible sources: “Local” or “Inherited”. Since this is a new Policy based on the Default Policy, all the Alert States are Inherited. If we make a change here, the source of the Alert State will be Local. Policy Inheritance is an advanced topic and we will be getting into more detail on this in later articles.

d. Let’s make some Alert State changes, to this policy.

5. First we must find the two Alerts we want to Suppress for the Test Cluster Policy:

a. The first Alert is the Stress Alert. For this exercise, we want to Suppress Stress Alerts for our Test Cluster. So let’s find the Stress Alert Definitions:

b: At the top of the Alert Definitions list, you will find a filter. Type the word “stress” into that filter. (See Figure 10)

Stress Alerts Blocked Locally
Figure 10: Changing Local Status of an Alert in Local Policy

6. You will see two Stress Alert definitions: One for Memory Stress and another for CPU Stress. Since we want to suppress both for our Test cluster, use the Drop Down menu for each to change the Alert Status from “Enabled/Inherited” to “Disabled/Local”. (The Red Circle with a line through it).

7. Since we also want to suppress Disk Space alerts for this Policy, we can continue from the same page. After you have edited the Stress Alert status, you can switch to the Disk Space Alerts without moving from this page. Simply type “disk space” into the filter criteria as seen in Figure 11.

Suppressing Disk Space Alerts for VMs
Figure 11: Finding and disabling selected Disk Space Alerts

a. As the figure depicts, this is a bit more complicated than the previous example, since there are several Disk Space Alert Definitions. This illustrates a critical concept when changing any alert definition. Make sure you are changing the Alert you intend to change!

b. To keep this consistent, we will change only the Disk Space alerts for Virtual Machines. If you wish to disable Disk Space for Datastores instead, simply make sure you note the Object Type that is affected by the Alert.

8. Once you have made your changes, you can move to Step 6 of the Policy Wizard.

a. Assign the Policy to the “SDDC Test” Group.

b. Save the Policy.

Assign Policy to Test and Save
Figure 12: Assigning the Test Policy and Saving

Conclusion

You have now seen two examples of using Policies to modify Alert settings (in this case suppressing the alert for a selected part of the environment). However, be aware, this is only one of several Alert suppression methods. We will cover others in future articles. So, let’s review the advantages vs. disadvantages of this method:

Advantages:

The primary advantage for this method is its simplicity.

  • Relatively simple to implement
  • Relatively simple to understand
  • Can be implemented quickly

Disadvantages:

The primary disadvantage of this method is its lack of granularity. When you use separate policies to split the behavior of Alerts, you are not only changing Alert behavior. You are also “forking” the Policy Settings for other settings, which can lead to less administrative flexibility. Other Policy Settings affect by this method are:

  • Badge Score States
  • Capacity Settings
  • Attribute Settings (Metric Assignments to Objects)
  • Super Metrics

NOTE: One method to overcome this disadvantage is to introduce an intermediate layer of Policies. This intermediate level can contain settings you wish your Groups to maintain in common, while child Policies can contain only changes you which to change between Groups. See the following illustrations for Policy structure examples:

Both examples presume that you will not change the Out-of-the-box Root Policy (Default Policy) settings, in accordance with Best Practices

Policy Structure A
Figure A

Figure A: Since you should not change the Default Policy, you cannot make Global Policy changes to Test or Production.

Policy Structure B
Figure B

Figure B is more flexible. You start with the Root out-of-the-box Default Policies. You can make Global changes to Capacity, Badges, Attributes and Super Metrics at the intermediate Policy, “Global SDDC Settings” (which are then inherited at the lower level). You can then use the third-level child policies only for Alerts.

A final disadvantage to this method is that while you can take it to extremes and create “one-off” exceptions where Custom Groups have a single or handful of resources, this is not recommended. One-off exceptions can be handled in other ways. We will cover this in future articles.

Stay tuned for Part 4 of this series: Alert Tuning Using vSphere Tags

Operational vROPs Part 2: Creating Custom Groups on Day 1

In my previous post, I introduced a discussion about the relative importance and relative time-sensitivity of the three major alert categories in vROPs 6.x. The three major alert categories, Health, Risk and Efficiency are intended to provide a general prioritization framework for Alert responses.

vROPs 6 Recommendations
Figure 1: vROPs 6.x Alerts On the Recommendations Page

Having established this, users will obviously see the need to further refine the Alert Definitions that result from the Out-of-the-box settings in vROPs Management Packs. This applies to all Management Packs, not only the vSphere Management Pack, which is where the vast majority of vROPs users start.

So “Alert Tuning” is frequently one of the first topics to come up with new vROPs installations. (The other common topic deals with data interpretation, a topic that will be covered at length in a future series of posts on this blog).

Unfortunately, while Alert Tuning is at the top of the list of priorities in vROPs, few customers are prepared – from an administrative standpoint – to begin to address it. To be more precise, there are some basic steps you need to take in the very early days of your vROPs implementation to handle Alert Tuning. They are not complex, but do take a moderate investment in time. So it is best to address these basic steps within the first days of your implementation.

HOW VROPS ARE ALERTS MANAGED

Unfortunately, while it would be nice to directly edit alert definitions in vROPs by directly editing alert symptoms from an Active Alert, this capability does not yet exist. (It is planned for an upcoming release).

Alerts are defined within vROPs Management Policies. So to adjust or tune any alert (or suppress it entirely), you must edit the relevant Policy assigned to an object in vROPs or create a new policy.

We will not cover the basics of Policy and Alert definition here. To understand the basics, of Alert Management (Symptoms, Alerts, Recommendations and Actions) refer to Chapter 4 in the vRealize Operations Manager Customization and Administration Guide.

Instead we will focus on the administrative basics, and why the immediate creation of Custom Groups is the first – and frankly, most essential – step to prepare you to manage and tune alerts.

THE IMPORTANCE OF CUSTOM GROUPS

Since vROPs Alerts are managed within vROPs Management Policies, it makes sense that you will want multiple policies for your environment, unless you intend to maintain identical alert definitions for your entire environment, which is unlikely for most environments.

This brings us to Groups. If you intend to use multiple Management Policies (which you inevitably will), you will also need to create Groups of objects within vROPs.

The reason for this is found within the vROPs Policy Editor itself. The last page of Policy Editing wizard reveals that the only Associations you can assign to different policies is Custom Groups.

Assign Policy to Group in Policy Editor
Figure 2: Assign Policy to Custom Group in Policy Editor

Therefore:

  1. If you have not created Custom Groups, ….
  2. …You have nothing to which to assign Custom Policies, so…
  3. …You are not prepared to do Alert Tuning

So, it makes sense to create Custom Groups as your very first Administrative Project in vROPs.

HOW SHOULD I GROUP MY vSPHERE OBJECTS?

There are many criteria in vROPs you can choose to create custom groups. A few examples are:

  • Object Name
  • Object Metrics
  • Object Properties
  • Object Relationship

The key criteria in this example are the Name of an Object and its relationships. This is because the most obvious way to initially create Groups in vROPs is by your vSphere Clusters.

The steps for creating Custom Groups in vROPs are also documented on Chapter 2 of the vRealize Operations Manager Customization and Administration Guide. But let’s walk through the basic steps of creating a Custom Group based on Clusters in a few easy steps:

First, you must understand that to create the group that will define your clusters the way most vSphere Admins would expect, you will need three Group Criteria. (Maybe only three, but we will get to that in a moment.)

CREATING A vROPs CUSTOM GROUP BASED ON vSPHERE CLUSTERS

  1. Make sure you are logged into vROPs as an Admin-level user.
  2. Click on the Environments Quicklink.
  3. Click on the “Custom Groups” Option. This will bring you to a screen displaying the Custom Groups Type and existing Groups.

    Figure 3: Custom Groups Screen in vROPs 6.x
  4. Select the Group Type (I use “Function” for this).
  5. Click on the “+” Icon to create a New Custom Group (See illustration above).
  6. Type in a Name for Your Group (I recommend the exact same name for your Cluster as seen in vCenter)
    1. Now, you are ready to define the Group Membership, which requires some simple, but precise logic.
    2. You will see two constructs in this page of the Wizard: Criteria, and Criteria Sets. For this exercise, we will need multiple Criteria Sets which will combine Multiple Criteria.
  7. In your first criteria set, select the following options:
    1. “Select the Object Type that matches all of the following criteria: “Cluster Compute Resource”
    2. First Drop Down Menu in 2nd Line: “Object Name”
    3. Operator Drop Down Menu: “is” (or “contains”)
    4. Type in Criteria:
    5. See figure for example

      Custom Group First Criteria
      Figure 4: Custom Group Criteria to Capture Cluster Object Only
  8. Now click the Preview Button at the bottom of the page.

    Custom Group Preview with Cluster Object Only
    Figure 5: Custom Group Preview with Cluster Object Only

    You will see that this ONLY captures the Cluster object itself…not its members. To capture those, you will need additional Criteria Sets.

  9. To include the Members of your Clusters, you will need to add one or more additional Criteria Sets…basically one for each Object Type that exists within your Cluster.
  10. To Add a new Criteria, click the link labeled “Add”…
    1. CAUTION: For this use case, DO NOT use “Add  Criteria” within the same grey box.. We wish to use OR Criteria we are adding to this definition. Adding a new Criteria within a Criteria Set, creates AND logic, which counter intuitively restricts membership.
  11. In your second criteria, select the following options. This criteria will capture the Virtual Machines in your target Cluster
    1. Select the Object Type that matches all of the following criteria: “Virtual Machine”
    2. First Drop Down Menu in 2nd Line: “Relationship”
    3. Relationship Down Menu: “Descendant of”
    4. Operator Drop Down Menu: “Contains”
    5. Type in Criteria:
  12. Repeat Step 10 to capture the Hosts in your target cluster.

    Custom Group for Cluster Defined
    Figure 6: Custom Group Definition That Captures Cluster, and its VMs and Hosts
  13. OPTIONAL: Repeat Step 10 to capture the Datastores in your target cluster.
    1. Why is this “Optional”? Whether you should include your Datastores in these Custom Group definitions depends upon how you intend to manage your Datastores in vROPs, which depends, in large part on how you present storage to Clusters in vSphere.
    2. Some customers present the same Datastores to multiple clusters, thus sharing storage between clusters.
    3. Other customers may dedicate storage on a cluster by cluster basis.
    4. For the purposes of vROPs Management, the key factor is whether you intend to manage Storage as a separate resource from your Compute Resources or as part of the Clusters they serve. If you choose the former, you may find it better to create Custom Groups dedicated only to Datastores themselves. If you choose the latter, you may find it useful to include Datastores in your Cluster-based Custom Groups in vROPs.
  14. When you have finished these Criteria Set Definitions, click the “Preview” button to confirm your Groups have the proper management, then, click “OK” to Save the Group.
    Custom Group Preview
    Figure 7: Custom Group Preview Displaying Cluster, VMs and Hosts (Datastores Not Included in this Example).

    As a final step in creating your Custom Group, you may wish to assign a specific Policy to the new Group. For now, I recommend you stick with the original Policy assignments (which may be identical for all Groups/Clusters at this time).

  15.  The exercise described in this post is a preparatory step, for the future when you will inevitably want to begin assigning different policies to different Groups in your Environment.

So, that’s one Cluster. So to create groups for each cluster, you will need to complete this exercise for each cluster you want to manage. Of course, this can be a significant amount of work, if you are managing a very large environment. But in large part, it is something you are likely only to need to do once. Also, there are some ways to reduce even this initial Administrative workload:

  1. Use the “Clone Group” feature. You can use the same criteria framework for each new Cluster-Based Group. Copying and pasting the name of the Cluster into each cloned group reduces much – if not all – of the tedium.
  2. You may not need a separate Group for each Cluster. For example, you may only have very coarse Groupings, like “Test”, “Staging”, and “Production” that span multiple Clusters. So combining Clusters into these larger groups will save much of the repetitive definitions.
  3. At VMware, we are working on developing some workflows in vRealize Orchestrator that will can automate this entire process. Look for news on that effort on this blog, hopefully very soon.

SO WHAT HAVE I REALLY ACCOMPLISHED HERE?

Admittedly, these tasks are not the most exciting activities you will be asked to perform. Until the automation methods are available, this is something that will take a bit of effort. But it should not be days of work. Even in the largest environments, you can accomplish some very useful partitioning of your environment in a couple hours.

But the crucial point is that until you engage in an exercise like this, you are not ready to move to the next steps. So take these recommendations as general suggestions and take some time to consider how you can do this most effectively in your environment. Ask your VMware SE or TAM for additional suggestions. Then find the time to complete these initial tasks. As you will see in future articles, this is work that will certainly pay off, and very early.

Coming Up Next: Operational vROPs Part 3: Basic Alert Tuning Using Policies

Operational vROPs Part 1: All Alerts Are Not Created Equal

For most new users of vRealize Operations (vROPs), the first thing they see is the “Recommendations” landing page after initial login.

vROPs 6 Recommendations

Figure 1: vROPs 6.x Recommendations Screen

In vROPs 6.x, you may notice that vROPs is very Alert-Centric. This is by design. Users of vROPs predecessor product, vCOPs 5.x, provided VMware with very clear feedback that with a large number of objects to manage, they needed very simple indicators of performance state in easily understood Red/Yellow/Green formats.

This is the basis of the Major Badge scores seen on the intial vROPs Recommendations screen.

vROPs 6.0 Alerts Are Designed to be Self-Explanatory

The purpose of vROPs 6.x alerts is for all of them to be self-explanatory. For the most part, based on initial user feedback, it would appear that this specific goal has been met. All alerts in vROPs 6.x have symptoms, impact descriptions, recommendations and even automated actions that most users can interpret without much help. If help is needed, vROPs comes with some very handy explanatory videos imbedded in many screens in the UI.

But what is also happening in many cases – especially with newer users – is that the alerts are often being lumped together as issues of equal priority. This is actually not how vROPs (or vCOPs in the past) intends for these issues to be interpreted.

“OK, I Understand the Content of The Alert, But….”

Of course, if an alert is raised by any management product, very basic and fundamental questions will be asked beyond the content within the alert:

  1. How important is this issue?
  2. How long do I have to address this issue?
  3. Is this issue higher or lower in priority to other issues?
  4. If I see alerts in this category, what is my typical next step?

These are just a few examples and surely there are others. But to help sort this out, we must first remind users about the three classes of MAJOR ALERTS in vROPs and how they are intended to be used:

“What Are the Major Alert Categories in vROPs?”

The three major alert classes in vROPs are:

  • Health
  • Risk
  • Efficiency

As a reminder: All monitored objects in the vROPs inventory have an alert indicator for each of the above categories.

These are not new concepts in vROPs 6.x. These categories existed in the vCOPs 5.x releases as well. What is different – as mentioned earlier – is the mechanism regarding how these three “alert states” are determined. In vCOPs 5.x, the alert states for Health, Risk and Efficiency were based on numerical scores calculated by vCOPs.

In vROPs 6.x, these scores have been deprecated. Instead, each object’s alerts state is based solely on the most severe individual active alert for that object. For example, f an object has at least one “Critical” (Red) Alert, the alert state for that object will be Red. This change in vROPs 6.x was implemented to significantly reduce the Alert volume.

However, in medium to large environments, even with this reduced volume, customers are still often presented with many alerts and may struggle with prioritizing them.

The process of Alert Prioritization is actually built-in to the concept of Health, Risk and Efficiency categories. Here is a summary of how you may choose to:

  1. Prioritize Alerts.
  2. Determine Proper Response Times.
  3. Choose the Right vROPs Feature to Investigate Issue Detail.

Recommended Response Framework for vROPs Alerts

Alert Type Health Risk Efficiency
Issue Description Immediate Performance and/or Outage Trending Issues related to Capacity, Compliance or Cyclical Behaviors Methods and Practices affecting optimal resource utilization
Response Time Minutes-Hours Days-Weeks Periodic Review
Best Fit vROPs Tool or Feature Alerts Reports Reports
Dashboards Views Views
Management Packs (per User) Alerts (Low to Medium Priority) Dashboards:
Role-based Access Control Features for: NOC, War Room, Specialists

Let’s look into these recommendations in a bit more detail….

“What Does Each Type of Alert Category Really Mean?”

It may help to better define what Health, Risk and Efficiency Alerts are trying to say, in the general context of IT priorities. Much of this is also explained in the videos within the product, but we will revisit them here:

Health: Issues in this category are Immediate Problems. Objects in a Critical Health State are likely experiencing issues NOW.

Operational vROPs Sample Health Alert

Figure 2: A Typical Health Alert

Risk: Issues in this category should be thought of as Trending Issues. These are typically related to Capacity, Configuration, Compliance and other Cyclical Behaviors that may not present immediate issues at present, but if the underlying conditions are not addressed, could develop into Health issues soon.

Operational vROPs Sample Risk Alert

Figure 3: A Typical Risk Alert

Efficiency: Issues in this category are the least urgent of the three categories. These generally related to practices in provisioning, and sizing, that offer opportunities for optimization.

Operational vROPs Reclamation Opps

Figure 4: Reclamation Opportunity Details

“How Long Do I Have to Address Each Alert Category?”

If we accept the above descriptions of these alerts, we should intuitively be able to prioritize them by the categories themselves. But for the sake of completeness, let’s assign some general time frames to how urgently you should respond to each alert:

  • Health Alerts: As “Immediate Issues” these alerts are intended to be resolved in Minutes or Hours.
  • Risk Alerts: As “Trending Issues”, these issue should be addressed in a time frame of Days or Weeks.
  • Efficiency Alerts: Efficiency conditions are more opportunities than problems. You won’t receive a large volume of efficiency alerts. And by their very nature, Efficiency issues generally do not have a ticking clock. So the recommendation for Efficiency issues is to address them in regularly scheduled reviews.

These time frames are only notional. In the end, you should address each type of issue, based on your own IT policies, or ultimately in accordance with your Service Level Agreements.

“So I Am Prioritizing My Responses to Each Alert Type: Now What?”

As was stated earlier, vROPs 6.x is very feature rich. So much so, in fact, that it can be doubly challenging to understand which features are best suited to which types of issues.

Just as you have a tool-box in your garage to fix various issues around your house or your vehicle, the various screens in vROPs 6.x are tools in an IT toolbox. Some tools fit best into certain situations.

The ultimate selection of a tool for a given situation is a matter of personal choice. But if you are just starting out with vROPs 6.x some guidance can be very helpful.

Below are some recommendations for which vROPs Features would be best suited for use to address the three alert categories within the product. For details on how each feature is used, consult your vROPs 6.x product documentation.

Health Issues:

As “Immediate Issues” the Alerts themselves are your primary tool. As such Health Alerts need to be the most precise and relevant of the Alert types. So proper tuning of Alerts for Heath are critical. The concept of Alert tuning in vROPs 6.x is a topic we will address in detail in upcoming articles in this series.

Other tools to help you with Health Issues include:

  • Detailed Dashboards: Each Dashboard has deeper detail into many issues. vROPs 6.x ships with several useful dashboards out of the box, and each Management Pack you install adds many more. You can, of course, create custom dashboards for assistance to diagnose different type of problems related to storage, networking, compute resources, applications and so on….

Operational vROps Sample Dashboard

Figure 5: Sample vROPs 6.x Dashboard

  • The Analysis & Troubleshooting Tabs: Every object in vROPs has multiple detailed screens in the Analysis and Troubleshooting section. More precise metric details, timelines and raw data can be explored. In addition, no object in vROPs operates in a vacuum, so these features of the product can help you sort out the “criminals” vs. “victims” questions so critical to addressing the complexities of shared environments.
    Operational vROPs Analysis Page
    Figure 6: Sample Analysis Tab for a VM in vROPs 6.x.

Risk Issues:

With “Trending Issues”, you have more time, so more detail can be explored. This is where vROPs Detailed “Views” come into play. Think of each View as a live spreadsheet that can be analyzed in realtime. There are hundreds of OOB Views and you can easily create your own customized Views (a new feature for vROPs 6). Views can be published on a schedule in the form of Reports that can be saved in PDF or CSV format, and can be e-mailed.

Do not discount the deeper Analysis Screens for Risk Alerts. You can plot Capacity and Usage trends on dedicated screens in the Analysis Tab.

And finally, the Projects Tab (another new feature in vROPs 6.x) allows you to model present and future capacity actions interactively.

Efficiency Issues:

Since these issues are “Opportunities” you can address them reflectively. Here again, you will find Views and Reports as your most useful tools.

Hopefully, this has been a useful introduction to vROPs Alerts. Keep in mind, these are the basics. We are simply laying the groundwork for more sophisticated activities in the future. Stay tuned for upcoming of articles where we will lay the groundwork for more precise tuning of vROPs.

Coming Up Next: “Operational vROPs Part 2: Why You Should Create Custom Groups on Day 1”

Operational vROPs: Introduction

In December 2014, VMware introduced the most significant release of vRealize Operations (vROPS) since the initial introduction of vCenter Operations (vROPs prior name). This release has provided an entirely new platform for managing the Software Defined Enterprise.

Existing users of vCenter Operations are already “realizing” the benefits of this new platform, and the ever-growing ecosystem of add-ons in the form of management packs for the vSphere Virtual Infrastructure and much more of the evolving Software Defined Enterprise.

For more information on these solutions visit:

http://solutionexchange.vmware.com

In addition, VMware and others have changed the game considerably with more effective training, and there are even an increasing number of publications on the topic of managing vRealize Operations. Two recently released books can be found here:

http://www.amazon.com/vRealize-Operations-Performance-Capacity-Management-ebook/dp/B00RP13CF2/ref=sr_1_1?s=books&ie=UTF8&qid=1433191125&sr=1-1&keywords=vrealize+operations+manager

http://www.amazon.com/Mastering-vRealize-Operations-Manager-Norris-ebook/dp/B00XJRN940/ref=sr_1_2?s=books&ie=UTF8&qid=1433191125&sr=1-2&keywords=vrealize+operations+manager

But even with these resources, as I visit customers who are implementing vROPs, they are still asking very simple questions about the data they are seeing in vROPs and where to start to make vROPs more “Operational”.

To address this, it may be helpful to first define “Operational”. There are many possible ways to define this, but to me the simplest definition is that vROPs is Operational if it is being used on a daily basis by multiple users in your organization.

vROPs is very feature rich. And it will offer many tools to address day to day Performance and Capacity issues. But which tools to use in which situations is a topic that can be overwhelming at first.

So in the initial series of articles on this site, we will explore several of these issues in “Baby Steps” in the Multi-Part series “Getting Operational with vROPs”.

The attempt in this series is not to overwhelm anybody. The idea is to address how you will use vROPs in a very practical manner, using very simple steps.

So read on….. I hope this proves helpful.

Welcome to vROPerational: A Discussion Forum for VMware’s vRealize Solutions

I am happy to introduce this new forum for discussing, Tips, Tricks, Trends and any other topics relevant to VMware’s vRealize line of products and solutions. I have been an employee at VMware for over seven years, and the growth of the SDDC and VMware’s managment solutions has been phenomenal: both in terms of customer adoption and the capabilities of the solutions.

In the upcoming weeks and months, check here, for articles, how-to’s and other musings on the vRealize Product Suite. And please feel free to offer feedback as well.

Thanks and stay tuned for your first series of articles.

Mark S. Monce

Sr. Systems Engineer – Solutions Engineering and Technology

mmonce@vmware.com