Archive

Archive for the ‘NLB’ Category

Missing BDA hook rules – impact and potential root cause

September 18th, 2014 No comments

Some of you may have already heard and know what NLB is and how it works as described in the general Network Load Balancing Overview [http://technet.microsoft.com/en-us/library/cc725946.aspx].

An integral part of a TMG NLB solution is Bi-direction affinity, which is well described at the following link:

Bi-Directional Affinity in ISA Server [http://blogs.technet.com/b/isablog/archive/2008/03/12/bi-directional-affinity-in-isa-server.aspx].

Bi-directional affinity creates multiple instances of Network Load Balancing (NLB) on the same host, which work in tandem to ensure that responses from published servers are routed through the appropriate ISA servers in a cluster. Bi-directional affinity is commonly used when NLB is configured with Internet Security and Acceleration (ISA) servers. If bi-directional affinity is not consistent across all NLB hosts or if NLB fails to initialize bi-directional affinity, the NLB cluster will remain in the converging state until a consistent teaming configuration is detected.

Bi-directional affinity is a crucial thing if you enable NLB on multiple interfaces, as it ensures a single client to work through the same node and have consistent data flow.

By default when a client connects to an NLB interface a hash algorithm based on the packet source IP (client) is computed in the NLB driver to decide which NLB node should handle this request. On the way back (the server responses to the client) the source IP is the server IP (not the client IP) and without BDA it may be handled by another TMG NLB node – which would discard the server response, not having seen the client request. Hence, a mechanism is needed to guarantee that client/server packets are handled by the same host in the array.

Bidirectional affinity ensures the server responses are handled by the same TMG NLB node as the original client request. The mechanism ensuring this functionality is implemented as  so-called hook rules:

http://technet.microsoft.com/en-us/library/dd348817(v=ws.10).aspx

Filter hooks help to direct traffic in a Network Load Balancing (NLB) cluster by filtering network packets. If the filter hooks are not properly configured, the NLB cluster will continue to converge and operate normally, however, the server application that is running with NLB will not be able to properly register the hooks.

The essential logic of the hook rules is the following:

At each packet, NLB calls out to the registered drivers (in this case fweng) whether they want to modify how the hash is created.

For example, when a client sends a SYN packet, NLB “asks” TMG’s fweng driver how it should calculate the hash. TMG will tell (depending on its hook rules) to use for example the client source IP for hashing (which is the default behavior). The calculated hash instructs NLB for example that the first node should handle the traffic and pass the SYN to the backend server.

When the server responds to the internal NLB of the TMG array, NLB calls out again to ask TMG how the hash is supposed to be calculated.

Based on the same hook rule set and seeing the packet direction, TMG  tells NLB to hash based on the destination IP, which is again the client IP, so the packet will be handled by the same node as the original packet.

If the hook rule was not present , TMG would tell to use the default behavior (use the source IP), which would result in  calculating the hash based on the  source (server) IP, possibly yielding that a different node is supposed to handle the traffic, which is not what we want.

If  you have a TMG array with several nodes and NLB is enabled, then TMG service creates hook rules at start.

These rules can be checked by executing netsh tmg show nlb from an elevated command prompt, which yields similar output as can be seen below.

Notice the rules have a source and destination range, and a direction based on which TMG can decide what to tell NLB when its called: whether to hash on source (forward) or destination IP (reverse).

clip_image001

A potential problem occurs if hook rules are missing. In this post, we are going to explore a potential cause for missing hook rules.

The below is from a sample test lab which we built based on a particular issue we got reported by a customer:

clip_image002

In  the example above we can see only rules which were created and hash based on source (forward direction) for outgoing scenario. 

You however do not see any reverse rules, indicating that some rules may be missing .

I took those rules from my TMG Array with Internal 10.0.0.0/24, DMZ 192.168.0.0/24 and External 172.20.1.0/24 networks.

Let's imagine a scenario where we have created a publishing rule for a server from DMZ with a listener on External network and you have configured the rule to be half NAT (request appears from client).

Because there is no specific rule for the range external network -> DMZ, DMZ -> external, in both directions we use the default behavior to hash based on the source IP.

As described above, because the hook rule is missing, this may or may not work depending on client IP/published server twin. If the NLB hash algorithm gives the same NLB node ID for both the client and the server IP , it will work. Otherwise, client and server packets will be serviced by different hosts, and the published server responses will be dropped with   the  error 0xc0040017 FWX_E_TCP_NOT_SYN_PACKEP_DROPPED.

The reason of the issue is lack of some NLB hook rules.

These rules are created at the startup based on network rules. In a real world scenario, when we may have a lot of subnets, it's quite easy to miss a network rule between two networks.

In this case, this is exactly what happened –  there was no network relationship defined between External and DMZ, hence the appropriate rule was never created

Once we add the network rule, the hook rules will be created. I created the rule from External to DMZ network with Route relationship. In the output below you can see how hook rules changed.

clip_image003

Now we have appropriate rules for processing requests from External to DMZ network back and forth, ensuring that we hash using the same IP in both directions. Hence, we should not get error 0xc0040017 FWX_E_TCP_NOT_SYN_PACKEP_DROPPED any more.

If you see the above error, make sure to check whether the appropriate hook rules are present – one of the root causes for missing hook rules can be the missing network relationship definition.

Author:

Vasily Kobylin, Senior Support Engineer, Microsoft EMEA Forefront Edge

Reviewer:

Balint Toth, Senior Support Escalation Engineer, Microsoft EMEA Forefront Edge

Franck Heilmann, Senior Escalation Engineer, Microsoft EMEA Forefront Edge

Categories: NLB, TMG Tags:

Missing BDA hook rules – impact and potential root cause

September 18th, 2014 No comments

Some of you may have already heard and know what NLB is and how it works as described in the general Network Load Balancing Overview [http://technet.microsoft.com/en-us/library/cc725946.aspx].

An integral part of a TMG NLB solution is Bi-direction affinity, which is well described at the following link:

Bi-Directional Affinity in ISA Server [http://blogs.technet.com/b/isablog/archive/2008/03/12/bi-directional-affinity-in-isa-server.aspx].

Bi-directional affinity creates multiple instances of Network Load Balancing (NLB) on the same host, which work in tandem to ensure that responses from published servers are routed through the appropriate ISA servers in a cluster. Bi-directional affinity is commonly used when NLB is configured with Internet Security and Acceleration (ISA) servers. If bi-directional affinity is not consistent across all NLB hosts or if NLB fails to initialize bi-directional affinity, the NLB cluster will remain in the converging state until a consistent teaming configuration is detected.

Bi-directional affinity is a crucial thing if you enable NLB on multiple interfaces, as it ensures a single client to work through the same node and have consistent data flow.

By default when a client connects to an NLB interface a hash algorithm based on the packet source IP (client) is computed in the NLB driver to decide which NLB node should handle this request. On the way back (the server responses to the client) the source IP is the server IP (not the client IP) and without BDA it may be handled by another TMG NLB node – which would discard the server response, not having seen the client request. Hence, a mechanism is needed to guarantee that client/server packets are handled by the same host in the array.

Bidirectional affinity ensures the server responses are handled by the same TMG NLB node as the original client request. The mechanism ensuring this functionality is implemented as  so-called hook rules:

http://technet.microsoft.com/en-us/library/dd348817(v=ws.10).aspx

Filter hooks help to direct traffic in a Network Load Balancing (NLB) cluster by filtering network packets. If the filter hooks are not properly configured, the NLB cluster will continue to converge and operate normally, however, the server application that is running with NLB will not be able to properly register the hooks.

The essential logic of the hook rules is the following:

At each packet, NLB calls out to the registered drivers (in this case fweng) whether they want to modify how the hash is created.

For example, when a client sends a SYN packet, NLB “asks” TMG’s fweng driver how it should calculate the hash. TMG will tell (depending on its hook rules) to use for example the client source IP for hashing (which is the default behavior). The calculated hash instructs NLB for example that the first node should handle the traffic and pass the SYN to the backend server.

When the server responds to the internal NLB of the TMG array, NLB calls out again to ask TMG how the hash is supposed to be calculated.

Based on the same hook rule set and seeing the packet direction, TMG  tells NLB to hash based on the destination IP, which is again the client IP, so the packet will be handled by the same node as the original packet.

If the hook rule was not present , TMG would tell to use the default behavior (use the source IP), which would result in  calculating the hash based on the  source (server) IP, possibly yielding that a different node is supposed to handle the traffic, which is not what we want.

If  you have a TMG array with several nodes and NLB is enabled, then TMG service creates hook rules at start.

These rules can be checked by executing netsh tmg show nlb from an elevated command prompt, which yields similar output as can be seen below.

Notice the rules have a source and destination range, and a direction based on which TMG can decide what to tell NLB when its called: whether to hash on source (forward) or destination IP (reverse).

clip_image001

A potential problem occurs if hook rules are missing. In this post, we are going to explore a potential cause for missing hook rules.

The below is from a sample test lab which we built based on a particular issue we got reported by a customer:

clip_image002

In  the example above we can see only rules which were created and hash based on source (forward direction) for outgoing scenario. 

You however do not see any reverse rules, indicating that some rules may be missing .

I took those rules from my TMG Array with Internal 10.0.0.0/24, DMZ 192.168.0.0/24 and External 172.20.1.0/24 networks.

Let's imagine a scenario where we have created a publishing rule for a server from DMZ with a listener on External network and you have configured the rule to be half NAT (request appears from client).

Because there is no specific rule for the range external network -> DMZ, DMZ -> external, in both directions we use the default behavior to hash based on the source IP.

As described above, because the hook rule is missing, this may or may not work depending on client IP/published server twin. If the NLB hash algorithm gives the same NLB node ID for both the client and the server IP , it will work. Otherwise, client and server packets will be serviced by different hosts, and the published server responses will be dropped with   the  error 0xc0040017 FWX_E_TCP_NOT_SYN_PACKEP_DROPPED.

The reason of the issue is lack of some NLB hook rules.

These rules are created at the startup based on network rules. In a real world scenario, when we may have a lot of subnets, it's quite easy to miss a network rule between two networks.

In this case, this is exactly what happened –  there was no network relationship defined between External and DMZ, hence the appropriate rule was never created

Once we add the network rule, the hook rules will be created. I created the rule from External to DMZ network with Route relationship. In the output below you can see how hook rules changed.

clip_image003

Now we have appropriate rules for processing requests from External to DMZ network back and forth, ensuring that we hash using the same IP in both directions. Hence, we should not get error 0xc0040017 FWX_E_TCP_NOT_SYN_PACKEP_DROPPED any more.

If you see the above error, make sure to check whether the appropriate hook rules are present – one of the root causes for missing hook rules can be the missing network relationship definition.

Author:

Vasily Kobylin, Senior Support Engineer, Microsoft EMEA Forefront Edge

Reviewer:

Balint Toth, Senior Support Escalation Engineer, Microsoft EMEA Forefront Edge

Franck Heilmann, Senior Escalation Engineer, Microsoft EMEA Forefront Edge

Categories: NLB, TMG Tags:

Missing BDA hook rules – impact and potential root cause

September 18th, 2014 No comments

Some of you may have already heard and know what NLB is and how it works as described in the general Network Load Balancing Overview [http://technet.microsoft.com/en-us/library/cc725946.aspx].

An integral part of a TMG NLB solution is Bi-direction affinity, which is well described at the following link:

Bi-Directional Affinity in ISA Server [http://blogs.technet.com/b/isablog/archive/2008/03/12/bi-directional-affinity-in-isa-server.aspx].

Bi-directional affinity creates multiple instances of Network Load Balancing (NLB) on the same host, which work in tandem to ensure that responses from published servers are routed through the appropriate ISA servers in a cluster. Bi-directional affinity is commonly used when NLB is configured with Internet Security and Acceleration (ISA) servers. If bi-directional affinity is not consistent across all NLB hosts or if NLB fails to initialize bi-directional affinity, the NLB cluster will remain in the converging state until a consistent teaming configuration is detected.

Bi-directional affinity is a crucial thing if you enable NLB on multiple interfaces, as it ensures a single client to work through the same node and have consistent data flow.

By default when a client connects to an NLB interface a hash algorithm based on the packet source IP (client) is computed in the NLB driver to decide which NLB node should handle this request. On the way back (the server responses to the client) the source IP is the server IP (not the client IP) and without BDA it may be handled by another TMG NLB node – which would discard the server response, not having seen the client request. Hence, a mechanism is needed to guarantee that client/server packets are handled by the same host in the array.

Bidirectional affinity ensures the server responses are handled by the same TMG NLB node as the original client request. The mechanism ensuring this functionality is implemented as  so-called hook rules:

http://technet.microsoft.com/en-us/library/dd348817(v=ws.10).aspx

Filter hooks help to direct traffic in a Network Load Balancing (NLB) cluster by filtering network packets. If the filter hooks are not properly configured, the NLB cluster will continue to converge and operate normally, however, the server application that is running with NLB will not be able to properly register the hooks.

The essential logic of the hook rules is the following:

At each packet, NLB calls out to the registered drivers (in this case fweng) whether they want to modify how the hash is created.

For example, when a client sends a SYN packet, NLB “asks” TMG’s fweng driver how it should calculate the hash. TMG will tell (depending on its hook rules) to use for example the client source IP for hashing (which is the default behavior). The calculated hash instructs NLB for example that the first node should handle the traffic and pass the SYN to the backend server.

When the server responds to the internal NLB of the TMG array, NLB calls out again to ask TMG how the hash is supposed to be calculated.

Based on the same hook rule set and seeing the packet direction, TMG  tells NLB to hash based on the destination IP, which is again the client IP, so the packet will be handled by the same node as the original packet.

If the hook rule was not present , TMG would tell to use the default behavior (use the source IP), which would result in  calculating the hash based on the  source (server) IP, possibly yielding that a different node is supposed to handle the traffic, which is not what we want.

If  you have a TMG array with several nodes and NLB is enabled, then TMG service creates hook rules at start.

These rules can be checked by executing netsh tmg show nlb from an elevated command prompt, which yields similar output as can be seen below.

Notice the rules have a source and destination range, and a direction based on which TMG can decide what to tell NLB when its called: whether to hash on source (forward) or destination IP (reverse).

clip_image001

A potential problem occurs if hook rules are missing. In this post, we are going to explore a potential cause for missing hook rules.

The below is from a sample test lab which we built based on a particular issue we got reported by a customer:

clip_image002

In  the example above we can see only rules which were created and hash based on source (forward direction) for outgoing scenario. 

You however do not see any reverse rules, indicating that some rules may be missing .

I took those rules from my TMG Array with Internal 10.0.0.0/24, DMZ 192.168.0.0/24 and External 172.20.1.0/24 networks.

Let's imagine a scenario where we have created a publishing rule for a server from DMZ with a listener on External network and you have configured the rule to be half NAT (request appears from client).

Because there is no specific rule for the range external network -> DMZ, DMZ -> external, in both directions we use the default behavior to hash based on the source IP.

As described above, because the hook rule is missing, this may or may not work depending on client IP/published server twin. If the NLB hash algorithm gives the same NLB node ID for both the client and the server IP , it will work. Otherwise, client and server packets will be serviced by different hosts, and the published server responses will be dropped with   the  error 0xc0040017 FWX_E_TCP_NOT_SYN_PACKEP_DROPPED.

The reason of the issue is lack of some NLB hook rules.

These rules are created at the startup based on network rules. In a real world scenario, when we may have a lot of subnets, it's quite easy to miss a network rule between two networks.

In this case, this is exactly what happened –  there was no network relationship defined between External and DMZ, hence the appropriate rule was never created

Once we add the network rule, the hook rules will be created. I created the rule from External to DMZ network with Route relationship. In the output below you can see how hook rules changed.

clip_image003

Now we have appropriate rules for processing requests from External to DMZ network back and forth, ensuring that we hash using the same IP in both directions. Hence, we should not get error 0xc0040017 FWX_E_TCP_NOT_SYN_PACKEP_DROPPED any more.

If you see the above error, make sure to check whether the appropriate hook rules are present – one of the root causes for missing hook rules can be the missing network relationship definition.

Author:

Vasily Kobylin, Senior Support Engineer, Microsoft EMEA Forefront Edge

Reviewer:

Balint Toth, Senior Support Escalation Engineer, Microsoft EMEA Forefront Edge

Franck Heilmann, Senior Escalation Engineer, Microsoft EMEA Forefront Edge

Categories: NLB, TMG Tags:

New in SP2: Kerberos Authentication in Load Balanced Scenarios

October 12th, 2011 No comments

In TMG 2010 Service Pack 2, we did put our focus on bug fixing, in order to improve the overall experience with TMG 2010. However next to pure bug fixing, we also introduced some new features.

One of these new features introduces the possibility to allow Kerberos Authentication when connecting to TMG in a “High Availability” (HA) scenario.

Consider the scenario, where you have a TMG array of two or more nodes. Let’s call them Florence.contoso.com and Firenze.contso.com in this scenario. Both nodes are member of a Domain, and you require Proxy Authentication to authenticate your user for Forward Proxy Access. You enabled Load Balancing on the internal network, e.g. by enabling TMG Integrated NLB, the NLB VIP in your internal network resolves to SP2Array1.contoso.com in this example setup.

Why didn’t Kerberos authentication work in a HA scenario before?

In the given scenario your user could already use Kerberos to authenticate the proxy requests. They could only do this though, when using the FQDN of either one of the nodes, e.g. when using a WPAD file with both proxy FQDN included (see this article to find out how to configure FQDN in the WPAD file), or when connecting to only one node’s FQDN. It wasn’t possible to use Kerberos when connecting to the NLB virtual IP address or when installing a load balancer between the client and the TMG array to balance the requests. As Tom Shinder summarizes in the article CARP and High Availability – Not So Much, the setup using WPAD with multiple FQDNs included, provides some load balancing mechanisms like Client CARP, but it shouldn’t be considered as a HA scenario.

The reason for this limitation is directly connected to the Service Principal Names (SPN), which uniquely identifies an instance of a service. A web client (such as a browser) that uses Kerberos to authenticate against the proxy uses the proxy name as it knows it to construct the SPN for the client-to-proxy authentication. For Domain’s computer accounts, a respective SPN with the computer’s FQDN is created automatically when the computer is joined to the domain. This SPN is associated with the computer account so processes running as NETWORK SERVICE principal, such as TMG’s Firewall Service, can authenticate clients through Kerberos tickets referring to this SPN. In HA scenario all array members need to be able to authenticate such tickets because the client may connect to any array member. However, SPNs have to be registered on only one account. If you register the SPN on multiple accounts in your active directory, you will end up with duplicate SPNs, which can lead to quite unpredictable behavior in your AD. For further details here’s the MSDN description for SPNs:

A service principal name (SPN) is the name by which a client uniquely identifies an instance of a service. If you install multiple instances of a service on computers throughout a forest, each instance must have its own SPN. A given service instance can have multiple SPNs if there are multiple names that clients might use for authentication. For example, an SPN always includes the name of the host computer on which the service instance is running, so a service instance might register an SPN for each name or alias of its host. For more information about SPN format and composing a unique SPN, see Name Formats for Unique SPNs.

Before the Kerberos authentication service can use an SPN to authenticate a service, the SPN must be registered on the account object that the service instance uses to log on. A given SPN can be registered on only one account. For Win32 services, a service installer specifies the logon account when an instance of the service is installed. The installer then composes the SPNs and writes them as a property of the account object in Active Directory Domain Services. If the logon account of a service instance changes, the SPNs must be re-registered under the new account. For more information, see How a Service Registers its SPNs.

When a client wants to connect to a service, it locates an instance of the service, composes an SPN for that instance, connects to the service, and presents the SPN for the service to authenticate. For more information, see How Clients Compose a Service’s SPN.

http://msdn.microsoft.com/en-us/library/ms677949(v=VS.85).aspx

Before TMG SP2, the Firewall Service (the process hosting the Web Proxy and hence the one authenticating proxy clients) could only run under the NETWORK SERVICE account. Therefore it could authenticate only tickets that refer to SPNs associated with the respective computer account. If you considered to register the SPN for SP2Array1.contoso.com in this setup, you would need to register the SPN on each one of the array nodes computer accounts, which leads to duplicate SPNs in your AD and isn’t a valid configuration.

What did change in SP2 to allow Kerberos Authentication in HA scenarios?

In SP2 we introduced the possibility to configure the TMG Firewall Service to run in a context of a user account. With this new possibility it’s now possible to use Kerberos for HA scenarios, as you can register the SPN for the “HA-IP address” FQDN on the user account that is configured for Firewall Service.

How to configure Kerberos Authentication in TMG HA scenarios?

1. You need to configure a domain user account, which will be used as “TMG Firewall service account” in the future. In this example the Account name I use is TMGSP2KRB in Domain contoso.com

Recommendations for creating the user account to help protect your domain:

· The domain account that you use for the TMG Firewall service is not a member of any local or domain groups. However, since an account must be a member of at least one primary group, define a new placeholder group and use that as the primary group. Make sure that the placeholder group does not have any permission or user right on any domain resource.

· The domain account has no permissions or user rights on any domain resource.

· The domain account is used only for the TMG Firewall service and not for any other purpose within the domain.

Forefront TMG grants the domain account the minimal permissions required on the TMG array nodes when you configure it for the Firewall service and removes the permissions when you configure a different account. Hence, do not manually grant any permission for that user on the TMG array nodes.

2. You need to register at least the SPN for SP2Array1.contoso.com on this user account. You can register the SPN, using the setspn command line tool.

Here’s how to register the SPN for SP2Array1.contoso.com on the account TMGSP2KRB:

setspn –A http/SP2Array1.contoso.com TMGSP2KRB

In order to be able to use Kerberos when connecting to the FQDN of each individual node, we recommend to register the additional SPNs for each FQDN on the service account as well.

In the end you can use setspn –L TMGSP2KRB to list the SPNs which had been added to the user account. In this example scenario this will look similar to this:

clip_image002

3. With the account all setup, we can now open the TMG MMC of the array to complete the TMG configuration of this scenario.

Right-click the Array name and select properties:

clip_image004

Select the Credentials Tab to configure the service account:

clip_image006

Apply the settings. Please be aware, that the TMG Firewall Services needs to be restarted for this change to take effect. A proper prompt will be displayed when applying the configuration changes. It is also recommended to restart the TMG Admin console (MMC) in this case.

After successfully applying the change, you can open the Services.MSC to verify if the changes were applied successfully.

If applied successfully, you’ll notice, that the Microsoft Forefront TMG Firewall Service will Log On As contoso\TMGSP2KRB now:

clip_image008

When you configure Proxy Authentication now, and collect a network trace on one of your clients, the trace will look similar to this when you connect to your favorite technet blog Smiley

clip_image010

Notice that the client is using GSS-API Authorization which is the equivalent to Kerberos Authentication in Netmon 3.4.

If your client still tries to use NTLM, and the trace looks like this:

clip_image012

make sure that you Enabled Windows Integrated Authentication in the Advanced Internet Explorer Settings:

clip_image014

Without Integrated Windows Authentication the client will try to use NTLM instead of Kerberos. (Please don’t ask me why this setting had been named like this, as Kerberos and NTLM are both Integrated Windows Authentication types as far as I know J)

Author
Philipp Sand
Microsoft CSS Forefront Security Edge Team

Technical Reviewer
Oved Itzhak
Senior SDE TMG Product Group

Categories: authentication, Load Balancing, NLB, SP2, TMG Tags: