Archive

Archive for the ‘Tips’ Category

Help mom stay safer online

April 24th, 2012 No comments

Maybe she wants flowers, but a more practical gift this mother’s day might be to make sure your mom knows some basic rules about keeping your computer updated and creating strong passwords.

Read our six basic online safety tips.

You can also download the tips in an easy-to-print card for mom.

Since these tips are free, consider springing for the flowers too. Or how about a new Windows Phone?

Categories: online safety, passwords, security, Tips Tags:

Auditing Changes to Audit Policy

July 16th, 2010 No comments

Mitsuru, one of our support engineers in Japan, actually did some excellent research recently into exactly what our behavior is for auditing audit policy and I wanted to share that with you.

In Windows, we’ve always had auditing for changes to security policy.  Audit policy has always been one aspect of that policy.

However, it’s not so clear how to audit changes to audit policy.  The reason is, because the change itself might affect whether or not the audit is generated.  Usually in Windows, we generate audit after the operation that we are auditing, is performed.  When we generate audit, we always check audit policy to see if we need to generate an event.

So what would happen if you turned off the setting “audit changes to audit policy”?  Well, if we implemented it in the way we generally implement audit policy, nothing would happen- no event.  As described above, if we checked audit policy after we disabled audit policy, then the effective policy would say “don’t generate audit”.

But consider the case where a malicious audit or system administrator wants to cover their tracks.  One thing such a person might do, to not leave as much of a trace, is to disable audit policy before they do the bad thing, and re-enable it afterwards.  If we implemented audit normally, then there would be no trace of this.

To avoid this undesirable case, we changed around the instrumentation a little so that we always generate audit for certain audit policy change events.  This means that you might not get EXACTLY what you intended, but it also ensures that you can always find the significant events when someone disables  audit policy.

Anyway, to sum up, the following events are always audited when audit policy is disabled regardless of the “Audit Policy Change” subcategory setting in Windows Vista+:

4715 The audit policy (SACL) on an object was changed.
4719 System audit policy was changed.
4906 The CrashOnAuditFail value has changed.
4908 Special Groups Logon table modified.
4912 Per User Audit Policy was changed.

The following events are only audited when success auditing is enabled for the “Audit Policy Change” subcategory:
4902 The Per-user audit policy table was created.
4904 An attempt was made to register a security event source.
4905 An attempt was made to unregister a security event source.
4907 Auditing settings on object were changed.

Special thanks to Mitsuru for documenting this.

Categories: Descriptions, HowTo, Tips Tags:

Auditing Changes to Audit Policy

July 16th, 2010 No comments

Mitsuru, one of our support engineers in Japan, actually did some excellent research recently into exactly what our behavior is for auditing audit policy and I wanted to share that with you.

In Windows, we’ve always had auditing for changes to security policy.  Audit policy has always been one aspect of that policy.

However, it’s not so clear how to audit changes to audit policy.  The reason is, because the change itself might affect whether or not the audit is generated.  Usually in Windows, we generate audit after the operation that we are auditing, is performed.  When we generate audit, we always check audit policy to see if we need to generate an event.

So what would happen if you turned off the setting “audit changes to audit policy”?  Well, if we implemented it in the way we generally implement audit policy, nothing would happen- no event.  As described above, if we checked audit policy after we disabled audit policy, then the effective policy would say “don’t generate audit”.

But consider the case where a malicious audit or system administrator wants to cover their tracks.  One thing such a person might do, to not leave as much of a trace, is to disable audit policy before they do the bad thing, and re-enable it afterwards.  If we implemented audit normally, then there would be no trace of this.

To avoid this undesirable case, we changed around the instrumentation a little so that we always generate audit for certain audit policy change events.  This means that you might not get EXACTLY what you intended, but it also ensures that you can always find the significant events when someone disables  audit policy.

Anyway, to sum up, the following events are always audited when audit policy is disabled regardless of the “Audit Policy Change” subcategory setting in Windows Vista+:

4715 The audit policy (SACL) on an object was changed.
4719 System audit policy was changed.
4906 The CrashOnAuditFail value has changed.
4908 Special Groups Logon table modified.
4912 Per User Audit Policy was changed.

The following events are only audited when success auditing is enabled for the “Audit Policy Change” subcategory:
4902 The Per-user audit policy table was created.
4904 An attempt was made to register a security event source.
4905 An attempt was made to unregister a security event source.
4907 Auditing settings on object were changed.

Special thanks to Mitsuru for documenting this.

Categories: Descriptions, HowTo, Tips Tags:

Auditing Changes to Audit Policy

July 16th, 2010 Comments off

Mitsuru, one of our support engineers in Japan, actually did some excellent research recently into exactly what our behavior is for auditing audit policy and I wanted to share that with you.

In Windows, we’ve always had auditing for changes to security policy.  Audit policy has always been one aspect of that policy.

However, it’s not so clear how to audit changes to audit policy.  The reason is, because the change itself might affect whether or not the audit is generated.  Usually in Windows, we generate audit after the operation that we are auditing, is performed.  When we generate audit, we always check audit policy to see if we need to generate an event.

So what would happen if you turned off the setting “audit changes to audit policy”?  Well, if we implemented it in the way we generally implement audit policy, nothing would happen- no event.  As described above, if we checked audit policy after we disabled audit policy, then the effective policy would say “don’t generate audit”.

But consider the case where a malicious audit or system administrator wants to cover their tracks.  One thing such a person might do, to not leave as much of a trace, is to disable audit policy before they do the bad thing, and re-enable it afterwards.  If we implemented audit normally, then there would be no trace of this.

To avoid this undesirable case, we changed around the instrumentation a little so that we always generate audit for certain audit policy change events.  This means that you might not get EXACTLY what you intended, but it also ensures that you can always find the significant events when someone disables  audit policy.

Anyway, to sum up, the following events are always audited when audit policy is disabled regardless of the “Audit Policy Change” subcategory setting in Windows Vista+:

4715 The audit policy (SACL) on an object was changed.
4719 System audit policy was changed.
4906 The CrashOnAuditFail value has changed.
4908 Special Groups Logon table modified.
4912 Per User Audit Policy was changed.

The following events are only audited when success auditing is enabled for the “Audit Policy Change” subcategory:
4902 The Per-user audit policy table was created.
4904 An attempt was made to register a security event source.
4905 An attempt was made to unregister a security event source.
4907 Auditing settings on object were changed.

Special thanks to Mitsuru for documenting this.

Categories: Descriptions, HowTo, Tips Tags:

Auditing Changes to Audit Policy

July 16th, 2010 No comments

Mitsuru, one of our support engineers in Japan, actually did some excellent research recently into exactly what our behavior is for auditing audit policy and I wanted to share that with you.

In Windows, we’ve always had auditing for changes to security policy.  Audit policy has always been one aspect of that policy.

However, it’s not so clear how to audit changes to audit policy.  The reason is, because the change itself might affect whether or not the audit is generated.  Usually in Windows, we generate audit after the operation that we are auditing, is performed.  When we generate audit, we always check audit policy to see if we need to generate an event.

So what would happen if you turned off the setting “audit changes to audit policy”?  Well, if we implemented it in the way we generally implement audit policy, nothing would happen- no event.  As described above, if we checked audit policy after we disabled audit policy, then the effective policy would say “don’t generate audit”.

But consider the case where a malicious audit or system administrator wants to cover their tracks.  One thing such a person might do, to not leave as much of a trace, is to disable audit policy before they do the bad thing, and re-enable it afterwards.  If we implemented audit normally, then there would be no trace of this.

To avoid this undesirable case, we changed around the instrumentation a little so that we always generate audit for certain audit policy change events.  This means that you might not get EXACTLY what you intended, but it also ensures that you can always find the significant events when someone disables  audit policy.

Anyway, to sum up, the following events are always audited when audit policy is disabled regardless of the “Audit Policy Change” subcategory setting in Windows Vista+:

4715 The audit policy (SACL) on an object was changed.
4719 System audit policy was changed.
4906 The CrashOnAuditFail value has changed.
4908 Special Groups Logon table modified.
4912 Per User Audit Policy was changed.

The following events are only audited when success auditing is enabled for the “Audit Policy Change” subcategory:
4902 The Per-user audit policy table was created.
4904 An attempt was made to register a security event source.
4905 An attempt was made to unregister a security event source.
4907 Auditing settings on object were changed.

Special thanks to Mitsuru for documenting this.

Categories: Descriptions, HowTo, Tips Tags:

XPath to generate a list of NTLM authentications on Windows Vista or Later

May 13th, 2010 Comments off

Hi Everyone,


Sas sent me an email complaining that I am not posting as often as I should- sorry about that.  I am working on a different project now but I am still in close touch with the auditing team and I’ll try to do better.


Anyway a question that I hear regularly is, “how do I find all the NTLM authentications on my network”?


Other than running a network trace, the best way I have found (ok invented 🙂  to do this is to look at the logon events in the audit log.


One of the changes we made to the logon events in Windows Vista (and therefore subsequent releases of Windows) was to include the NTLM protocol level in the logon events, if the NTLM auth package was used.


Now, with the new EventLog ecosystem, it’s easy to generate some XPath to find just these events.


Here’s the query:







*[System


   [Provider


     [@Name=’Microsoft-Windows-Security-Auditing’]


       and Task = 12544


       and (band(Keywords,9007199254740992))


       and (EventID=4624)


   ]


   and


   EventData


     [Data


       [@Name=’LmPackageName’] != ‘-‘


     ]



 ]


 


To use this in Event Viewer:



  1. Find the Security log under Windows Logs in the tree pane.

  2. Right-click the Security log, and choose “Filter Current Log…”

  3. Select the “XML” tab.

  4. Check the “Edit query manually” box.

  5. Replace the default query (“*”, or everything in the “<Select>” element), with the text in the box above.  I’ve formatted it for readability.

  6. Click OK

The event view will now be filtered and you’ll only see NTLM logon events.  Additionally, each filtered event will contain a “Detailed Authentication Information” section containing the protocol level (e.g. LM, NTLM, NTLM V2) in the “Package Name” field, and the session key length, if one was negotiated.







Detailed Authentication Information:
            Logon Process: NtLmSsp
            Authentication Package: NTLM
            Transited Services: –
            Package Name (NTLM only): NTLM V2
            Key Length: 128


 

Categories: Descriptions, Tips, Tools Tags:

XPath to generate a list of NTLM authentications on Windows Vista or Later

May 13th, 2010 No comments

Hi Everyone,


Sas sent me an email complaining that I am not posting as often as I should- sorry about that.  I am working on a different project now but I am still in close touch with the auditing team and I’ll try to do better.


Anyway a question that I hear regularly is, “how do I find all the NTLM authentications on my network”?


Other than running a network trace, the best way I have found (ok invented 🙂  to do this is to look at the logon events in the audit log.


One of the changes we made to the logon events in Windows Vista (and therefore subsequent releases of Windows) was to include the NTLM protocol level in the logon events, if the NTLM auth package was used.


Now, with the new EventLog ecosystem, it’s easy to generate some XPath to find just these events.


Here’s the query:







*[System


   [Provider


     [@Name=’Microsoft-Windows-Security-Auditing’]


       and Task = 12544


       and (band(Keywords,9007199254740992))


       and (EventID=4624)


   ]


   and


   EventData


     [Data


       [@Name=’LmPackageName’] != ‘-‘


     ]



 ]


 


To use this in Event Viewer:



  1. Find the Security log under Windows Logs in the tree pane.

  2. Right-click the Security log, and choose “Filter Current Log…”

  3. Select the “XML” tab.

  4. Check the “Edit query manually” box.

  5. Replace the default query (“*”, or everything in the “<Select>” element), with the text in the box above.  I’ve formatted it for readability.

  6. Click OK

The event view will now be filtered and you’ll only see NTLM logon events.  Additionally, each filtered event will contain a “Detailed Authentication Information” section containing the protocol level (e.g. LM, NTLM, NTLM V2) in the “Package Name” field, and the session key length, if one was negotiated.







Detailed Authentication Information:
            Logon Process: NtLmSsp
            Authentication Package: NTLM
            Transited Services: –
            Package Name (NTLM only): NTLM V2
            Key Length: 128


 

Categories: Descriptions, Tips, Tools Tags:

XPath to generate a list of NTLM authentications on Windows Vista or Later

May 13th, 2010 No comments

Hi Everyone,


Sas sent me an email complaining that I am not posting as often as I should- sorry about that.  I am working on a different project now but I am still in close touch with the auditing team and I’ll try to do better.


Anyway a question that I hear regularly is, “how do I find all the NTLM authentications on my network”?


Other than running a network trace, the best way I have found (ok invented 🙂  to do this is to look at the logon events in the audit log.


One of the changes we made to the logon events in Windows Vista (and therefore subsequent releases of Windows) was to include the NTLM protocol level in the logon events, if the NTLM auth package was used.


Now, with the new EventLog ecosystem, it’s easy to generate some XPath to find just these events.


Here’s the query:







*[System


   [Provider


     [@Name=’Microsoft-Windows-Security-Auditing’]


       and Task = 12544


       and (band(Keywords,9007199254740992))


       and (EventID=4624)


   ]


   and


   EventData


     [Data


       [@Name=’LmPackageName’] != ‘-‘


     ]



 ]


 


To use this in Event Viewer:



  1. Find the Security log under Windows Logs in the tree pane.

  2. Right-click the Security log, and choose “Filter Current Log…”

  3. Select the “XML” tab.

  4. Check the “Edit query manually” box.

  5. Replace the default query (“*”, or everything in the “<Select>” element), with the text in the box above.  I’ve formatted it for readability.

  6. Click OK

The event view will now be filtered and you’ll only see NTLM logon events.  Additionally, each filtered event will contain a “Detailed Authentication Information” section containing the protocol level (e.g. LM, NTLM, NTLM V2) in the “Package Name” field, and the session key length, if one was negotiated.







Detailed Authentication Information:
            Logon Process: NtLmSsp
            Authentication Package: NTLM
            Transited Services: –
            Package Name (NTLM only): NTLM V2
            Key Length: 128


 

Categories: Descriptions, Tips, Tools Tags:

XPath to generate a list of NTLM authentications on Windows Vista or Later

May 13th, 2010 No comments

Hi Everyone,


Sas sent me an email complaining that I am not posting as often as I should- sorry about that.  I am working on a different project now but I am still in close touch with the auditing team and I’ll try to do better.


Anyway a question that I hear regularly is, “how do I find all the NTLM authentications on my network”?


Other than running a network trace, the best way I have found (ok invented :-)  to do this is to look at the logon events in the audit log.


One of the changes we made to the logon events in Windows Vista (and therefore subsequent releases of Windows) was to include the NTLM protocol level in the logon events, if the NTLM auth package was used.


Now, with the new EventLog ecosystem, it’s easy to generate some XPath to find just these events.


Here’s the query:







*[System


   [Provider


     [@Name=’Microsoft-Windows-Security-Auditing’]


       and Task = 12544


       and (band(Keywords,9007199254740992))


       and (EventID=4624)


   ]


   and


   EventData


     [Data


       [@Name=’LmPackageName’] != ‘-‘


     ]



 ]


 


To use this in Event Viewer:



  1. Find the Security log under Windows Logs in the tree pane.

  2. Right-click the Security log, and choose “Filter Current Log…”

  3. Select the “XML” tab.

  4. Check the “Edit query manually” box.

  5. Replace the default query (“*”, or everything in the “<Select>” element), with the text in the box above.  I’ve formatted it for readability.

  6. Click OK

The event view will now be filtered and you’ll only see NTLM logon events.  Additionally, each filtered event will contain a “Detailed Authentication Information” section containing the protocol level (e.g. LM, NTLM, NTLM V2) in the “Package Name” field, and the session key length, if one was negotiated.







Detailed Authentication Information:
            Logon Process: NtLmSsp
            Authentication Package: NTLM
            Transited Services: –
            Package Name (NTLM only): NTLM V2
            Key Length: 128


 

Categories: Descriptions, Tips, Tools Tags:

Auditing system impact on performance

August 10th, 2009 No comments

UPDATE 2010-06-06 (EricF) – Fixed Vista+ architecture image; link was broken on migration to new blog platform

I get questions from time to time, such as my recent offline question from Steve, about what performance impact auditing has on the system as a whole.

To answer this you need to understand a couple of things:

  1. Auditable activity is implemented as instrumentation (e.g. a function call to the auditing system) inside the code that does something auditable.
  2. The auditing system in Windows has two sets of programmatic interfaces for introducing an event, one in kernel mode and one in user mode – so the component generating audit does not need to switch between kernel and user modes.
  3. Although audit policy is stored in user mode, we cache a copy of the relevant policy for kernel-mode components, in kernel mode.  This means that no mode switch is necessary to check audit policy to decide whether to generate an event.
  4. There are user-mode and kernel-mode queues for audit events.  The call to generate an audit event actually just queues the event, assuming the queue is not full.  So from the perspective of the component generating audit, audit has an “asynchronous” flavor under light-medium loads.  Under heavy loads when the queues fill, audit blocks the component raising the audit until the event can be queued, showing its true synchronous behavior.
  5. Dequeuing audit events always occurs on a separate thread than enqueueing so that raising audit events and writing them to the log don’t affect each other’s perf under light to moderate load.
  6. The pre-Vista auditing system in the kernel delivers events to LSA.  The Vista+ auditing system in the kernel delivers most events directly to ETW, the kernel mode event trace engine, which means that most of the kernel audit (including the potentially perf impacting object access events) doesn’t require a mode switch at all.
  7. The LSA formats events and then delivers them to the event log.  In WS03, events are batched in the RPC call to eventlog.  In Vista+, delivery is done by means of ETW in almost all cases.
  8. ETW queues events and spools them to the Windows eventlog service as fast as the service will accept them.
  9. The eventlog service writes the events to the log file as they arrive.

I have uploaded graphics of the Windows XP/Windows Server 2003 auditing architecture, and the Windows Vista/WS08/Windows 7 architecture, to make this process more clear:

Pre-Vista Windows Auditing Architecture

 

Windows Vista+ Auditing Architecture

 

So now back to the original question- what is the impact of auditing on performance?

At low auditing loads, auditing generally has no discernable impact on perf.  If you were hardcode with a profiler and iterated an auditable activity a million times I am sure you’d be able to measure it, but for reasonable audit policies you won’t notice a significant difference.

At high auditing loads, auditing has a significant performance impact.  This is more true of pre-Vista multiprocessor systems than of systems with the new eventlog system.

For example, a multi-processor domain controller (say a 32-processor box) running Windows Server 2003, might run into problems under extreme load.  Why is that?  Because ultimately the limiting factor on event rate is how fast you can write the events to disk.  Pre-Vista eventlog has a single thread writing events to disk.  So even though you might have 32 threads servicing authentication requests (an auditable activity), each of them is queueing to a single audit queue which is ultimately despooling to eventlog via RPC on a single thread, and eventlog is only writing to the security log with a single thread.  What we observe in practice in this case is that a single processor on the system goes to 85-100% utilization, and the other processors drop to a very low utilization as the authentication threads are blocked waiting for the audit function call to return.  This call won’t return until the queue is not full, and the queue is waiting on RPC which is waiting on eventlog…  so eventlog governs the rate.

In Windows Server 2003, we added a particular optimization only for the security event log, which batches events in the RPC call to eventlog.  This means that you can get more event throughput in the security log than in other logs on the system.  It didn’t eliminate the bottleneck, but it pushed back the limit, so WS03 on typical hardware should be able to log several thousand events per second to the security event log.  Previous versions were only able to log about 1000 events per second.

Note that the change in performance characteristics occurs all at once.  So the impact tends to be trivial until the queues fill, at which point the impact is severe.  It does not scale linearly, there’s a discrete behavior change.  What this means realistically is that if you ever encounter a performance problem with auditing, then you probably just need to turn it down a little and you won’t have a problem any more. 

In Vista and subsequent releases, audit queues events via ETW.  ETW was designed for high-performance kernel tracing, and in the auditing team we tested it to over 10,000 (10.000 for you folks in Germany 🙂 events per second before we decided that we had hit our scale targets.  We never tested exactly how high it would go, but we were satisfied that the eventlog service was no longer a bottleneck in realistic scenarios.

There are some edge cases where you might run into performance problems by trying to audit too much in a critical path.  For instance, it is a really really bad idea to put SACLs on your entire registry.  If you monitor registry activity with a tool like Process Monitor, you will notice that when a system is not idle, there are often hundreds or thousands of registry accesses per second.  If you impose an auditing tax on each of those activities you will notice a degradation in performance.  Not to mention that the resultant mountain of events is probably not very valuable.  Of course you can tune SACLs as I have mentioned before, but I doubt that it’s useful to take the time to tune SACLs for the entire registry.

One last point is that the eventlog is writing the events somewhere.  Wherever it is writing events, it is consuming disk I/Os and competing with anything else writing to the same volume.  If you have a disk performance problem on that disk, it can result in an auditing performance problem, as everything else will back up if the eventlog can’t write events to disk fast enough.  So one thing you can do is ensure that the disk where your log is placed has enough I/Os.

In summary audit has very minimal impact unless you do a whole lot of it, in which case it can have severe impact on your system.  The change happens suddenly, not gradually, so you can do a lot of auditing with no problem.  If you run into a problem, turn it down just a little (or little by little) and at some point the behavior will change such that you won’t have any significant perf impact anymore.

Categories: Tips Tags:

Auditing system impact on performance

August 10th, 2009 Comments off

UPDATE 2010-06-06 (EricF) – Fixed Vista+ architecture image; link was broken on migration to new blog platform

I get questions from time to time, such as my recent offline question from Steve, about what performance impact auditing has on the system as a whole.

To answer this you need to understand a couple of things:

  1. Auditable activity is implemented as instrumentation (e.g. a function call to the auditing system) inside the code that does something auditable.
  2. The auditing system in Windows has two sets of programmatic interfaces for introducing an event, one in kernel mode and one in user mode – so the component generating audit does not need to switch between kernel and user modes.
  3. Although audit policy is stored in user mode, we cache a copy of the relevant policy for kernel-mode components, in kernel mode.  This means that no mode switch is necessary to check audit policy to decide whether to generate an event.
  4. There are user-mode and kernel-mode queues for audit events.  The call to generate an audit event actually just queues the event, assuming the queue is not full.  So from the perspective of the component generating audit, audit has an “asynchronous” flavor under light-medium loads.  Under heavy loads when the queues fill, audit blocks the component raising the audit until the event can be queued, showing its true synchronous behavior.
  5. Dequeuing audit events always occurs on a separate thread than enqueueing so that raising audit events and writing them to the log don’t affect each other’s perf under light to moderate load.
  6. The pre-Vista auditing system in the kernel delivers events to LSA.  The Vista+ auditing system in the kernel delivers most events directly to ETW, the kernel mode event trace engine, which means that most of the kernel audit (including the potentially perf impacting object access events) doesn’t require a mode switch at all.
  7. The LSA formats events and then delivers them to the event log.  In WS03, events are batched in the RPC call to eventlog.  In Vista+, delivery is done by means of ETW in almost all cases.
  8. ETW queues events and spools them to the Windows eventlog service as fast as the service will accept them.
  9. The eventlog service writes the events to the log file as they arrive.

I have uploaded graphics of the Windows XP/Windows Server 2003 auditing architecture, and the Windows Vista/WS08/Windows 7 architecture, to make this process more clear:

Pre-Vista Windows Auditing Architecture

 

Windows Vista+ Auditing Architecture

 

So now back to the original question- what is the impact of auditing on performance?

At low auditing loads, auditing generally has no discernable impact on perf.  If you were hardcode with a profiler and iterated an auditable activity a million times I am sure you’d be able to measure it, but for reasonable audit policies you won’t notice a significant difference.

At high auditing loads, auditing has a significant performance impact.  This is more true of pre-Vista multiprocessor systems than of systems with the new eventlog system.

For example, a multi-processor domain controller (say a 32-processor box) running Windows Server 2003, might run into problems under extreme load.  Why is that?  Because ultimately the limiting factor on event rate is how fast you can write the events to disk.  Pre-Vista eventlog has a single thread writing events to disk.  So even though you might have 32 threads servicing authentication requests (an auditable activity), each of them is queueing to a single audit queue which is ultimately despooling to eventlog via RPC on a single thread, and eventlog is only writing to the security log with a single thread.  What we observe in practice in this case is that a single processor on the system goes to 85-100% utilization, and the other processors drop to a very low utilization as the authentication threads are blocked waiting for the audit function call to return.  This call won’t return until the queue is not full, and the queue is waiting on RPC which is waiting on eventlog…  so eventlog governs the rate.

In Windows Server 2003, we added a particular optimization only for the security event log, which batches events in the RPC call to eventlog.  This means that you can get more event throughput in the security log than in other logs on the system.  It didn’t eliminate the bottleneck, but it pushed back the limit, so WS03 on typical hardware should be able to log several thousand events per second to the security event log.  Previous versions were only able to log about 1000 events per second.

Note that the change in performance characteristics occurs all at once.  So the impact tends to be trivial until the queues fill, at which point the impact is severe.  It does not scale linearly, there’s a discrete behavior change.  What this means realistically is that if you ever encounter a performance problem with auditing, then you probably just need to turn it down a little and you won’t have a problem any more. 

In Vista and subsequent releases, audit queues events via ETW.  ETW was designed for high-performance kernel tracing, and in the auditing team we tested it to over 10,000 (10.000 for you folks in Germany 🙂 events per second before we decided that we had hit our scale targets.  We never tested exactly how high it would go, but we were satisfied that the eventlog service was no longer a bottleneck in realistic scenarios.

There are some edge cases where you might run into performance problems by trying to audit too much in a critical path.  For instance, it is a really really bad idea to put SACLs on your entire registry.  If you monitor registry activity with a tool like Process Monitor, you will notice that when a system is not idle, there are often hundreds or thousands of registry accesses per second.  If you impose an auditing tax on each of those activities you will notice a degradation in performance.  Not to mention that the resultant mountain of events is probably not very valuable.  Of course you can tune SACLs as I have mentioned before, but I doubt that it’s useful to take the time to tune SACLs for the entire registry.

One last point is that the eventlog is writing the events somewhere.  Wherever it is writing events, it is consuming disk I/Os and competing with anything else writing to the same volume.  If you have a disk performance problem on that disk, it can result in an auditing performance problem, as everything else will back up if the eventlog can’t write events to disk fast enough.  So one thing you can do is ensure that the disk where your log is placed has enough I/Os.

In summary audit has very minimal impact unless you do a whole lot of it, in which case it can have severe impact on your system.  The change happens suddenly, not gradually, so you can do a lot of auditing with no problem.  If you run into a problem, turn it down just a little (or little by little) and at some point the behavior will change such that you won’t have any significant perf impact anymore.

Categories: Tips Tags:

Auditing system impact on performance

August 10th, 2009 No comments

UPDATE 2010-06-06 (EricF) – Fixed Vista+ architecture image; link was broken on migration to new blog platform

I get questions from time to time, such as my recent offline question from Steve, about what performance impact auditing has on the system as a whole.

To answer this you need to understand a couple of things:

  1. Auditable activity is implemented as instrumentation (e.g. a function call to the auditing system) inside the code that does something auditable.
  2. The auditing system in Windows has two sets of programmatic interfaces for introducing an event, one in kernel mode and one in user mode – so the component generating audit does not need to switch between kernel and user modes.
  3. Although audit policy is stored in user mode, we cache a copy of the relevant policy for kernel-mode components, in kernel mode.  This means that no mode switch is necessary to check audit policy to decide whether to generate an event.
  4. There are user-mode and kernel-mode queues for audit events.  The call to generate an audit event actually just queues the event, assuming the queue is not full.  So from the perspective of the component generating audit, audit has an “asynchronous” flavor under light-medium loads.  Under heavy loads when the queues fill, audit blocks the component raising the audit until the event can be queued, showing its true synchronous behavior.
  5. Dequeuing audit events always occurs on a separate thread than enqueueing so that raising audit events and writing them to the log don’t affect each other’s perf under light to moderate load.
  6. The pre-Vista auditing system in the kernel delivers events to LSA.  The Vista+ auditing system in the kernel delivers most events directly to ETW, the kernel mode event trace engine, which means that most of the kernel audit (including the potentially perf impacting object access events) doesn’t require a mode switch at all.
  7. The LSA formats events and then delivers them to the event log.  In WS03, events are batched in the RPC call to eventlog.  In Vista+, delivery is done by means of ETW in almost all cases.
  8. ETW queues events and spools them to the Windows eventlog service as fast as the service will accept them.
  9. The eventlog service writes the events to the log file as they arrive.

I have uploaded graphics of the Windows XP/Windows Server 2003 auditing architecture, and the Windows Vista/WS08/Windows 7 architecture, to make this process more clear:

Pre-Vista Windows Auditing Architecture

 

Windows Vista+ Auditing Architecture

 

So now back to the original question- what is the impact of auditing on performance?

At low auditing loads, auditing generally has no discernable impact on perf.  If you were hardcode with a profiler and iterated an auditable activity a million times I am sure you’d be able to measure it, but for reasonable audit policies you won’t notice a significant difference.

At high auditing loads, auditing has a significant performance impact.  This is more true of pre-Vista multiprocessor systems than of systems with the new eventlog system.

For example, a multi-processor domain controller (say a 32-processor box) running Windows Server 2003, might run into problems under extreme load.  Why is that?  Because ultimately the limiting factor on event rate is how fast you can write the events to disk.  Pre-Vista eventlog has a single thread writing events to disk.  So even though you might have 32 threads servicing authentication requests (an auditable activity), each of them is queueing to a single audit queue which is ultimately despooling to eventlog via RPC on a single thread, and eventlog is only writing to the security log with a single thread.  What we observe in practice in this case is that a single processor on the system goes to 85-100% utilization, and the other processors drop to a very low utilization as the authentication threads are blocked waiting for the audit function call to return.  This call won’t return until the queue is not full, and the queue is waiting on RPC which is waiting on eventlog…  so eventlog governs the rate.

In Windows Server 2003, we added a particular optimization only for the security event log, which batches events in the RPC call to eventlog.  This means that you can get more event throughput in the security log than in other logs on the system.  It didn’t eliminate the bottleneck, but it pushed back the limit, so WS03 on typical hardware should be able to log several thousand events per second to the security event log.  Previous versions were only able to log about 1000 events per second.

Note that the change in performance characteristics occurs all at once.  So the impact tends to be trivial until the queues fill, at which point the impact is severe.  It does not scale linearly, there’s a discrete behavior change.  What this means realistically is that if you ever encounter a performance problem with auditing, then you probably just need to turn it down a little and you won’t have a problem any more. 

In Vista and subsequent releases, audit queues events via ETW.  ETW was designed for high-performance kernel tracing, and in the auditing team we tested it to over 10,000 (10.000 for you folks in Germany 🙂 events per second before we decided that we had hit our scale targets.  We never tested exactly how high it would go, but we were satisfied that the eventlog service was no longer a bottleneck in realistic scenarios.

There are some edge cases where you might run into performance problems by trying to audit too much in a critical path.  For instance, it is a really really bad idea to put SACLs on your entire registry.  If you monitor registry activity with a tool like Process Monitor, you will notice that when a system is not idle, there are often hundreds or thousands of registry accesses per second.  If you impose an auditing tax on each of those activities you will notice a degradation in performance.  Not to mention that the resultant mountain of events is probably not very valuable.  Of course you can tune SACLs as I have mentioned before, but I doubt that it’s useful to take the time to tune SACLs for the entire registry.

One last point is that the eventlog is writing the events somewhere.  Wherever it is writing events, it is consuming disk I/Os and competing with anything else writing to the same volume.  If you have a disk performance problem on that disk, it can result in an auditing performance problem, as everything else will back up if the eventlog can’t write events to disk fast enough.  So one thing you can do is ensure that the disk where your log is placed has enough I/Os.

In summary audit has very minimal impact unless you do a whole lot of it, in which case it can have severe impact on your system.  The change happens suddenly, not gradually, so you can do a lot of auditing with no problem.  If you run into a problem, turn it down just a little (or little by little) and at some point the behavior will change such that you won’t have any significant perf impact anymore.

Categories: Tips Tags:

Auditing system impact on performance

August 10th, 2009 No comments

UPDATE 2010-06-06 (EricF) – Fixed Vista+ architecture image; link was broken on migration to new blog platform

I get questions from time to time, such as my recent offline question from Steve, about what performance impact auditing has on the system as a whole.

To answer this you need to understand a couple of things:

  1. Auditable activity is implemented as instrumentation (e.g. a function call to the auditing system) inside the code that does something auditable.
  2. The auditing system in Windows has two sets of programmatic interfaces for introducing an event, one in kernel mode and one in user mode – so the component generating audit does not need to switch between kernel and user modes.
  3. Although audit policy is stored in user mode, we cache a copy of the relevant policy for kernel-mode components, in kernel mode.  This means that no mode switch is necessary to check audit policy to decide whether to generate an event.
  4. There are user-mode and kernel-mode queues for audit events.  The call to generate an audit event actually just queues the event, assuming the queue is not full.  So from the perspective of the component generating audit, audit has an “asynchronous” flavor under light-medium loads.  Under heavy loads when the queues fill, audit blocks the component raising the audit until the event can be queued, showing its true synchronous behavior.
  5. Dequeuing audit events always occurs on a separate thread than enqueueing so that raising audit events and writing them to the log don’t affect each other’s perf under light to moderate load.
  6. The pre-Vista auditing system in the kernel delivers events to LSA.  The Vista+ auditing system in the kernel delivers most events directly to ETW, the kernel mode event trace engine, which means that most of the kernel audit (including the potentially perf impacting object access events) doesn’t require a mode switch at all.
  7. The LSA formats events and then delivers them to the event log.  In WS03, events are batched in the RPC call to eventlog.  In Vista+, delivery is done by means of ETW in almost all cases.
  8. ETW queues events and spools them to the Windows eventlog service as fast as the service will accept them.
  9. The eventlog service writes the events to the log file as they arrive.

I have uploaded graphics of the Windows XP/Windows Server 2003 auditing architecture, and the Windows Vista/WS08/Windows 7 architecture, to make this process more clear:

Pre-Vista Windows Auditing Architecture

 

Windows Vista+ Auditing Architecture

 

So now back to the original question- what is the impact of auditing on performance?

At low auditing loads, auditing generally has no discernable impact on perf.  If you were hardcode with a profiler and iterated an auditable activity a million times I am sure you’d be able to measure it, but for reasonable audit policies you won’t notice a significant difference.

At high auditing loads, auditing has a significant performance impact.  This is more true of pre-Vista multiprocessor systems than of systems with the new eventlog system.

For example, a multi-processor domain controller (say a 32-processor box) running Windows Server 2003, might run into problems under extreme load.  Why is that?  Because ultimately the limiting factor on event rate is how fast you can write the events to disk.  Pre-Vista eventlog has a single thread writing events to disk.  So even though you might have 32 threads servicing authentication requests (an auditable activity), each of them is queueing to a single audit queue which is ultimately despooling to eventlog via RPC on a single thread, and eventlog is only writing to the security log with a single thread.  What we observe in practice in this case is that a single processor on the system goes to 85-100% utilization, and the other processors drop to a very low utilization as the authentication threads are blocked waiting for the audit function call to return.  This call won’t return until the queue is not full, and the queue is waiting on RPC which is waiting on eventlog…  so eventlog governs the rate.

In Windows Server 2003, we added a particular optimization only for the security event log, which batches events in the RPC call to eventlog.  This means that you can get more event throughput in the security log than in other logs on the system.  It didn’t eliminate the bottleneck, but it pushed back the limit, so WS03 on typical hardware should be able to log several thousand events per second to the security event log.  Previous versions were only able to log about 1000 events per second.

Note that the change in performance characteristics occurs all at once.  So the impact tends to be trivial until the queues fill, at which point the impact is severe.  It does not scale linearly, there’s a discrete behavior change.  What this means realistically is that if you ever encounter a performance problem with auditing, then you probably just need to turn it down a little and you won’t have a problem any more. 

In Vista and subsequent releases, audit queues events via ETW.  ETW was designed for high-performance kernel tracing, and in the auditing team we tested it to over 10,000 (10.000 for you folks in Germany :-) events per second before we decided that we had hit our scale targets.  We never tested exactly how high it would go, but we were satisfied that the eventlog service was no longer a bottleneck in realistic scenarios.

There are some edge cases where you might run into performance problems by trying to audit too much in a critical path.  For instance, it is a really really bad idea to put SACLs on your entire registry.  If you monitor registry activity with a tool like Process Monitor, you will notice that when a system is not idle, there are often hundreds or thousands of registry accesses per second.  If you impose an auditing tax on each of those activities you will notice a degradation in performance.  Not to mention that the resultant mountain of events is probably not very valuable.  Of course you can tune SACLs as I have mentioned before, but I doubt that it’s useful to take the time to tune SACLs for the entire registry.

One last point is that the eventlog is writing the events somewhere.  Wherever it is writing events, it is consuming disk I/Os and competing with anything else writing to the same volume.  If you have a disk performance problem on that disk, it can result in an auditing performance problem, as everything else will back up if the eventlog can’t write events to disk fast enough.  So one thing you can do is ensure that the disk where your log is placed has enough I/Os.

In summary audit has very minimal impact unless you do a whole lot of it, in which case it can have severe impact on your system.  The change happens suddenly, not gradually, so you can do a lot of auditing with no problem.  If you run into a problem, turn it down just a little (or little by little) and at some point the behavior will change such that you won’t have any significant perf impact anymore.

Categories: Tips Tags:

Mapping pre-Vista Security Event IDs to Security Event IDs in Vista+

June 11th, 2009 No comments

I’ve written twice (here and here) about the relationship between the “old” event IDs (5xx-6xx) in WS03 and earlier versions of Windows, and between the “new” security event IDs (4xxx-5xxx) in Vista and beyond.


In short, EventID(WS03) + 4096 = EventID(WS08) for almost all security events in WS03.


The exceptions are the logon events.  The logon success events (540, 528) were collapsed into a single event 4624 (=528 + 4096).  The logon failure events (529-537, 539) were collapsed into a single event 4625 (=529+4096).


Other than that, there are cases where old events were deprecated (IPsec IIRC), and there are cases where new events were added (DS Change).  These are all new instrumentation and there is no “mapping” possible- e.g. the new DS Change audit events are complementary to the old DS Access events; they record something different than the old events so you can’t say that the old event xxx = the new event yyy because they aren’t equivalent.  The old event means one thing and the new event means another thing; they represent different points of instrumentation in the OS, not just formatting changes in the event representation in the log.


Of course I explained earlier why we renumbered the events, and (in the same place) why the difference is “+4096” instead of something more human-friendly like “+1000”.  The bottom line is that the event schema is different, so by changing the event IDs (and not re-using any), we force existing automation to be updated rather than just misinterpreting events when the automation doesn’t know the version of Windows that produced the event.  We realized it would be painful but it is nowhere near as painful as if every event consumer had to be aware of, and have special casing for, pre-Vista events and post-Vista events with the same IDs but different schema.


So if you happen to know the pre-Vista security events, then you can quickly translate your existing knowledge to Vista by adding 4000, adding 100, and subtracting 4.  You can do this in your head.


However if you’re trying to implement some automation, you should avoid trying to make a chart with “<Vista” and “>=Vista” columns of event ID numbers, because this will likely result in mis-parsing one set of events, and because you’ll find it frustrating that there is not a 1:1 mapping (and in some cases no mapping at all).


Eric


 


 


 

Categories: Descriptions, Tips, Tools Tags:

Mapping pre-Vista Security Event IDs to Security Event IDs in Vista+

June 11th, 2009 Comments off

I’ve written twice (here and here) about the relationship between the “old” event IDs (5xx-6xx) in WS03 and earlier versions of Windows, and between the “new” security event IDs (4xxx-5xxx) in Vista and beyond.


In short, EventID(WS03) + 4096 = EventID(WS08) for almost all security events in WS03.


The exceptions are the logon events.  The logon success events (540, 528) were collapsed into a single event 4624 (=528 + 4096).  The logon failure events (529-537, 539) were collapsed into a single event 4625 (=529+4096).


Other than that, there are cases where old events were deprecated (IPsec IIRC), and there are cases where new events were added (DS Change).  These are all new instrumentation and there is no “mapping” possible- e.g. the new DS Change audit events are complementary to the old DS Access events; they record something different than the old events so you can’t say that the old event xxx = the new event yyy because they aren’t equivalent.  The old event means one thing and the new event means another thing; they represent different points of instrumentation in the OS, not just formatting changes in the event representation in the log.


Of course I explained earlier why we renumbered the events, and (in the same place) why the difference is “+4096” instead of something more human-friendly like “+1000”.  The bottom line is that the event schema is different, so by changing the event IDs (and not re-using any), we force existing automation to be updated rather than just misinterpreting events when the automation doesn’t know the version of Windows that produced the event.  We realized it would be painful but it is nowhere near as painful as if every event consumer had to be aware of, and have special casing for, pre-Vista events and post-Vista events with the same IDs but different schema.


So if you happen to know the pre-Vista security events, then you can quickly translate your existing knowledge to Vista by adding 4000, adding 100, and subtracting 4.  You can do this in your head.


However if you’re trying to implement some automation, you should avoid trying to make a chart with “<Vista” and “>=Vista” columns of event ID numbers, because this will likely result in mis-parsing one set of events, and because you’ll find it frustrating that there is not a 1:1 mapping (and in some cases no mapping at all).


Eric


 


 


 

Categories: Descriptions, Tips, Tools Tags:

Mapping pre-Vista Security Event IDs to Security Event IDs in Vista+

June 11th, 2009 No comments

I’ve written twice (here and here) about the relationship between the “old” event IDs (5xx-6xx) in WS03 and earlier versions of Windows, and between the “new” security event IDs (4xxx-5xxx) in Vista and beyond.


In short, EventID(WS03) + 4096 = EventID(WS08) for almost all security events in WS03.


The exceptions are the logon events.  The logon success events (540, 528) were collapsed into a single event 4624 (=528 + 4096).  The logon failure events (529-537, 539) were collapsed into a single event 4625 (=529+4096).


Other than that, there are cases where old events were deprecated (IPsec IIRC), and there are cases where new events were added (DS Change).  These are all new instrumentation and there is no “mapping” possible- e.g. the new DS Change audit events are complementary to the old DS Access events; they record something different than the old events so you can’t say that the old event xxx = the new event yyy because they aren’t equivalent.  The old event means one thing and the new event means another thing; they represent different points of instrumentation in the OS, not just formatting changes in the event representation in the log.


Of course I explained earlier why we renumbered the events, and (in the same place) why the difference is “+4096” instead of something more human-friendly like “+1000”.  The bottom line is that the event schema is different, so by changing the event IDs (and not re-using any), we force existing automation to be updated rather than just misinterpreting events when the automation doesn’t know the version of Windows that produced the event.  We realized it would be painful but it is nowhere near as painful as if every event consumer had to be aware of, and have special casing for, pre-Vista events and post-Vista events with the same IDs but different schema.


So if you happen to know the pre-Vista security events, then you can quickly translate your existing knowledge to Vista by adding 4000, adding 100, and subtracting 4.  You can do this in your head.


However if you’re trying to implement some automation, you should avoid trying to make a chart with “<Vista” and “>=Vista” columns of event ID numbers, because this will likely result in mis-parsing one set of events, and because you’ll find it frustrating that there is not a 1:1 mapping (and in some cases no mapping at all).


Eric


 


 


 

Categories: Descriptions, Tips, Tools Tags:

Mapping pre-Vista Security Event IDs to Security Event IDs in Vista+

June 10th, 2009 No comments

I’ve written twice (here and here) about the relationship between the “old” event IDs (5xx-6xx) in WS03 and earlier versions of Windows, and between the “new” security event IDs (4xxx-5xxx) in Vista and beyond.


In short, EventID(WS03) + 4096 = EventID(WS08) for almost all security events in WS03.


The exceptions are the logon events.  The logon success events (540, 528) were collapsed into a single event 4624 (=528 + 4096).  The logon failure events (529-537, 539) were collapsed into a single event 4625 (=529+4096).


Other than that, there are cases where old events were deprecated (IPsec IIRC), and there are cases where new events were added (DS Change).  These are all new instrumentation and there is no “mapping” possible- e.g. the new DS Change audit events are complementary to the old DS Access events; they record something different than the old events so you can’t say that the old event xxx = the new event yyy because they aren’t equivalent.  The old event means one thing and the new event means another thing; they represent different points of instrumentation in the OS, not just formatting changes in the event representation in the log.


Of course I explained earlier why we renumbered the events, and (in the same place) why the difference is “+4096″ instead of something more human-friendly like “+1000″.  The bottom line is that the event schema is different, so by changing the event IDs (and not re-using any), we force existing automation to be updated rather than just misinterpreting events when the automation doesn’t know the version of Windows that produced the event.  We realized it would be painful but it is nowhere near as painful as if every event consumer had to be aware of, and have special casing for, pre-Vista events and post-Vista events with the same IDs but different schema.


So if you happen to know the pre-Vista security events, then you can quickly translate your existing knowledge to Vista by adding 4000, adding 100, and subtracting 4.  You can do this in your head.


However if you’re trying to implement some automation, you should avoid trying to make a chart with “<Vista” and “>=Vista” columns of event ID numbers, because this will likely result in mis-parsing one set of events, and because you’ll find it frustrating that there is not a 1:1 mapping (and in some cases no mapping at all).


Eric


 


 


 

Categories: Descriptions, Tips, Tools Tags:

Minimizing Directory Service Audit Event Noise

September 5th, 2008 Comments off

I’ve written before on noise reduction in the Windows security event log.  I’ve also written to describe how object access auditing works.  But, I still get questions on how to reduce noise from object access events.  The other day I got that question, specific to Directory Service objects, on an internal discussion list so I thought I’d clean up the answer a bit and share it with the world.  In general the same is true for any type of object, although there are a few more knobs to control for DS objects.


Object access audit is generated when the system access control list (SACL) on the object matches the access that was performed on ALL of the following conditions:



  1. Object – the object that was accessed must have either an explicit or inherited SACL.  The access performed is compared against the ACEs in that SACL.

  2. Success or failure of activity – every audit access control entry (ACE) in a SACL will be either of type AUDIT_SUCCESS or AUDIT_FAILURE.  The access performed must match the access type of the ACE for the rest of the ACE to be considered.

  3. User account – the accessing user’s token is compared against each ACE matching the access type.  If the user, or a group the user belongs to, matches the SID in the ACE, then an audit might be generated.

  4. Access – the access being performed must match the audited accesses in the access mask in an otherwise matching ACE.

The specific auditing algorithm is discussed here.


So the way to reduce the number of audit events (566 on Windows Server 2003, 4662 on Windows Server 2008, or one of the new DS Change events on Windows Server 2008) is to cause one or more of those conditions to fail, except in the specific cases that you care about.


The SACL which will generate the most audit events is “Everyone:Success & Failure:All accesses” on the domain head with OI,CI (object inherit & container inherit flags) for all object types.  This SACL matches all of the above conditions in all cases.  (Incidentally I think that this is pretty close to the default SACL- with the exception of failures- for Windows 2000 Active Directory installations, and SACLs are not updated when DCs are upgraded from version to version.  Windows Server 2003 has much more conservative SACLs for new installations of AD.)


To reduce noise, I offer the following suggestions, addressing each of the above conditions:



  1. Audit only the objects that you care about.  User accounts and groups already are well-audited with “Account Management” auditing, so don’t audit them with DS access.  Perhaps audit OUs, or other DS objects.  Use the Object Type and attribute type restrictions that you have in DS Access auditing.  Also, in Windows Server 2008, you can affect auditing on a per-object basis by adjusting the SearchFlags attribute in the AD schema for the object.  SACLs are more easily reversed so are probably a more acceptable method of controlling audit for most organizations.

  2. Audit successful accesses only.  Failed accesses are common and are NOT indicative of any security problem; in fact many failures are not even explicit requests by the user but are just normal requests made by the OS, and the OS will re-try with less access if the operation fails.  In my experience failure auditing is primarily useful for troubleshooting, not for security.

  3. Audit the “Everyone” group.  Although this matches any user, you will not accidentally miss any accesses that you care about due to failing to audit a user account who has access to the objects in question.  The only time that you would NOT audit “Everyone” is if you had an application or service account which was very noisy; in that case you’d need to create a group with all accounts EXCEPT the noisy accounts, and audit that group.

  4. Audit only the accesses that you care about.  Specifically, read accesses occur much more often (in my experience, a conservative estimate is about a 100:1 ratio) than write accesses.  If you restrict your auditing to “write” type accesses (including change, delete, change permissions, create, etc.) then you will end up generating far fewer events.  Auditing for read access is very noisy.  If you must audit for reads, consider auditing fewer objects, perhaps only auditing reads on the container object instead of the objects in the container, or on one “interesting” object in any given container as a “canary”.

 

Categories: HowTo, Tips Tags:

Minimizing Directory Service Audit Event Noise

September 5th, 2008 No comments

I’ve written before on noise reduction in the Windows security event log.  I’ve also written to describe how object access auditing works.  But, I still get questions on how to reduce noise from object access events.  The other day I got that question, specific to Directory Service objects, on an internal discussion list so I thought I’d clean up the answer a bit and share it with the world.  In general the same is true for any type of object, although there are a few more knobs to control for DS objects.


Object access audit is generated when the system access control list (SACL) on the object matches the access that was performed on ALL of the following conditions:



  1. Object – the object that was accessed must have either an explicit or inherited SACL.  The access performed is compared against the ACEs in that SACL.

  2. Success or failure of activity – every audit access control entry (ACE) in a SACL will be either of type AUDIT_SUCCESS or AUDIT_FAILURE.  The access performed must match the access type of the ACE for the rest of the ACE to be considered.

  3. User account – the accessing user’s token is compared against each ACE matching the access type.  If the user, or a group the user belongs to, matches the SID in the ACE, then an audit might be generated.

  4. Access – the access being performed must match the audited accesses in the access mask in an otherwise matching ACE.

The specific auditing algorithm is discussed here.


So the way to reduce the number of audit events (566 on Windows Server 2003, 4662 on Windows Server 2008, or one of the new DS Change events on Windows Server 2008) is to cause one or more of those conditions to fail, except in the specific cases that you care about.


The SACL which will generate the most audit events is “Everyone:Success & Failure:All accesses” on the domain head with OI,CI (object inherit & container inherit flags) for all object types.  This SACL matches all of the above conditions in all cases.  (Incidentally I think that this is pretty close to the default SACL- with the exception of failures- for Windows 2000 Active Directory installations, and SACLs are not updated when DCs are upgraded from version to version.  Windows Server 2003 has much more conservative SACLs for new installations of AD.)


To reduce noise, I offer the following suggestions, addressing each of the above conditions:



  1. Audit only the objects that you care about.  User accounts and groups already are well-audited with “Account Management” auditing, so don’t audit them with DS access.  Perhaps audit OUs, or other DS objects.  Use the Object Type and attribute type restrictions that you have in DS Access auditing.  Also, in Windows Server 2008, you can affect auditing on a per-object basis by adjusting the SearchFlags attribute in the AD schema for the object.  SACLs are more easily reversed so are probably a more acceptable method of controlling audit for most organizations.

  2. Audit successful accesses only.  Failed accesses are common and are NOT indicative of any security problem; in fact many failures are not even explicit requests by the user but are just normal requests made by the OS, and the OS will re-try with less access if the operation fails.  In my experience failure auditing is primarily useful for troubleshooting, not for security.

  3. Audit the “Everyone” group.  Although this matches any user, you will not accidentally miss any accesses that you care about due to failing to audit a user account who has access to the objects in question.  The only time that you would NOT audit “Everyone” is if you had an application or service account which was very noisy; in that case you’d need to create a group with all accounts EXCEPT the noisy accounts, and audit that group.

  4. Audit only the accesses that you care about.  Specifically, read accesses occur much more often (in my experience, a conservative estimate is about a 100:1 ratio) than write accesses.  If you restrict your auditing to “write” type accesses (including change, delete, change permissions, create, etc.) then you will end up generating far fewer events.  Auditing for read access is very noisy.  If you must audit for reads, consider auditing fewer objects, perhaps only auditing reads on the container object instead of the objects in the container, or on one “interesting” object in any given container as a “canary”.

 

Categories: HowTo, Tips Tags:

Minimizing Directory Service Audit Event Noise

September 5th, 2008 No comments

I’ve written before on noise reduction in the Windows security event log.  I’ve also written to describe how object access auditing works.  But, I still get questions on how to reduce noise from object access events.  The other day I got that question, specific to Directory Service objects, on an internal discussion list so I thought I’d clean up the answer a bit and share it with the world.  In general the same is true for any type of object, although there are a few more knobs to control for DS objects.


Object access audit is generated when the system access control list (SACL) on the object matches the access that was performed on ALL of the following conditions:



  1. Object – the object that was accessed must have either an explicit or inherited SACL.  The access performed is compared against the ACEs in that SACL.

  2. Success or failure of activity – every audit access control entry (ACE) in a SACL will be either of type AUDIT_SUCCESS or AUDIT_FAILURE.  The access performed must match the access type of the ACE for the rest of the ACE to be considered.

  3. User account – the accessing user’s token is compared against each ACE matching the access type.  If the user, or a group the user belongs to, matches the SID in the ACE, then an audit might be generated.

  4. Access – the access being performed must match the audited accesses in the access mask in an otherwise matching ACE.

The specific auditing algorithm is discussed here.


So the way to reduce the number of audit events (566 on Windows Server 2003, 4662 on Windows Server 2008, or one of the new DS Change events on Windows Server 2008) is to cause one or more of those conditions to fail, except in the specific cases that you care about.


The SACL which will generate the most audit events is “Everyone:Success & Failure:All accesses” on the domain head with OI,CI (object inherit & container inherit flags) for all object types.  This SACL matches all of the above conditions in all cases.  (Incidentally I think that this is pretty close to the default SACL- with the exception of failures- for Windows 2000 Active Directory installations, and SACLs are not updated when DCs are upgraded from version to version.  Windows Server 2003 has much more conservative SACLs for new installations of AD.)


To reduce noise, I offer the following suggestions, addressing each of the above conditions:



  1. Audit only the objects that you care about.  User accounts and groups already are well-audited with “Account Management” auditing, so don’t audit them with DS access.  Perhaps audit OUs, or other DS objects.  Use the Object Type and attribute type restrictions that you have in DS Access auditing.  Also, in Windows Server 2008, you can affect auditing on a per-object basis by adjusting the SearchFlags attribute in the AD schema for the object.  SACLs are more easily reversed so are probably a more acceptable method of controlling audit for most organizations.

  2. Audit successful accesses only.  Failed accesses are common and are NOT indicative of any security problem; in fact many failures are not even explicit requests by the user but are just normal requests made by the OS, and the OS will re-try with less access if the operation fails.  In my experience failure auditing is primarily useful for troubleshooting, not for security.

  3. Audit the “Everyone” group.  Although this matches any user, you will not accidentally miss any accesses that you care about due to failing to audit a user account who has access to the objects in question.  The only time that you would NOT audit “Everyone” is if you had an application or service account which was very noisy; in that case you’d need to create a group with all accounts EXCEPT the noisy accounts, and audit that group.

  4. Audit only the accesses that you care about.  Specifically, read accesses occur much more often (in my experience, a conservative estimate is about a 100:1 ratio) than write accesses.  If you restrict your auditing to “write” type accesses (including change, delete, change permissions, create, etc.) then you will end up generating far fewer events.  Auditing for read access is very noisy.  If you must audit for reads, consider auditing fewer objects, perhaps only auditing reads on the container object instead of the objects in the container, or on one “interesting” object in any given container as a “canary”.

 

Categories: HowTo, Tips Tags: