Analysing Forefront TMG logs

I’m lamenting the demise of Forefront TMG but I still think it’s a great product and continue to use it. Of course, I’m currently planning what I’m going to replace it with but I expect it’ll have to be a combination of products to get the features that I need.

Anyway, I’m currently doing some log analysis and wanted to capture some useful information about the process. TMG is using SQL Server Express for logging and I want to do some analysis in SQL Server proper.

First, export the logs from TMG. I won’t repeat the steps from that post but in summary:

Run the Import and Export Data application (All Programs / Microsoft SQL Server 2008 / Import and Export Data). Run this elevated/as administrator.
Choose the database (not today’s databases or you’ll cause contention).
Select Flat File Destination and include Column names in the first data row.
Select Tab {t} as the Column delimiter.
Export.

I took the files from this export and wanted to import them into SQL Server. Rather than import everything into varchar fields, I wanted to reproduce the schema that TMG uses. I couldn’t find the schema on the interwebs so I grabbed the databases from an offline TMG and scripted the tables.

Here’s the script to create the FirewallLog table:

CREATE TABLE [dbo].[FirewallLog](
    [servername] [nvarchar](128) NULL,
    [logTime] [datetime] NULL,
    [protocol] [varchar](32) NULL,
    [SourceIP] [uniqueidentifier] NULL,
    [SourcePort] [int] NULL,
    [DestinationIP] [uniqueidentifier] NULL,
    [DestinationPort] [int] NULL,
    [OriginalClientIP] [uniqueidentifier] NULL,
    [SourceNetwork] [nvarchar](128) NULL,
    [DestinationNetwork] [nvarchar](128) NULL,
    [Action] [smallint] NULL,
    [resultcode] [int] NULL,
    [rule] [nvarchar](128) NULL,
    [ApplicationProtocol] [nvarchar](128) NULL,
    [Bidirectional] [smallint] NULL,
    [bytessent] [bigint] NULL,
    [bytessentDelta] [bigint] NULL,
    [bytesrecvd] [bigint] NULL,
    [bytesrecvdDelta] [bigint] NULL,
    [connectiontime] [int] NULL,
    [connectiontimeDelta] [int] NULL,
    [DestinationName] [varchar](255) NULL,
    [ClientUserName] [varchar](514) NULL,
    [ClientAgent] [varchar](255) NULL,
    [sessionid] [int] NULL,
    [connectionid] [int] NULL,
    [Interface] [varchar](25) NULL,
    [IPHeader] [varchar](255) NULL,
    [Payload] [varchar](255) NULL,
    [GmtLogTime] [datetime] NULL,
    [ipsScanResult] [smallint] NULL,
    [ipsSignature] [nvarchar](128) NULL,
    [NATAddress] [uniqueidentifier] NULL,
    [FwcClientFqdn] [varchar](255) NULL,
    [FwcAppPath] [varchar](260) NULL,
    [FwcAppSHA1Hash] [varchar](41) NULL,
    [FwcAppTrusState] [smallint] NULL,
    [FwcAppInternalName] [varchar](64) NULL,
    [FwcAppProductName] [varchar](64) NULL,
    [FwcAppProductVersion] [varchar](20) NULL,
    [FwcAppFileVersion] [varchar](20) NULL,
    [FwcAppOrgFileName] [varchar](64) NULL,
    [InternalServiceInfo] [int] NULL,
    [ipsApplicationProtocol] [nvarchar](128) NULL,
    [FwcVersion] [varchar](32) NULL
) ON [PRIMARY]

and here's the script to create the WebProxyLog table:

CREATE TABLE [dbo].[WebProxyLog](
    [ClientIP] [uniqueidentifier] NULL,
    [ClientUserName] [nvarchar](514) NULL,
    [ClientAgent] [varchar](128) NULL,
    [ClientAuthenticate] [smallint] NULL,
    [logTime] [datetime] NULL,
    [service] [smallint] NULL,
    [servername] [nvarchar](32) NULL,
    [referredserver] [varchar](255) NULL,
    [DestHost] [varchar](255) NULL,
    [DestHostIP] [uniqueidentifier] NULL,
    [DestHostPort] [int] NULL,
    [processingtime] [int] NULL,
    [bytesrecvd] [bigint] NULL,
    [bytessent] [bigint] NULL,
    [protocol] [varchar](13) NULL,
    [transport] [varchar](8) NULL,
    [operation] [varchar](24) NULL,
    [uri] [varchar](2048) NULL,
    [mimetype] [varchar](32) NULL,
    [objectsource] [smallint] NULL,
    [resultcode] [int] NULL,
    [CacheInfo] [int] NULL,
    [rule] [nvarchar](128) NULL,
    [FilterInfo] [nvarchar](256) NULL,
    [SrcNetwork] [nvarchar](128) NULL,
    [DstNetwork] [nvarchar](128) NULL,
    [ErrorInfo] [int] NULL,
    [Action] [varchar](32) NULL,
    [GmtLogTime] [datetime] NULL,
    [AuthenticationServer] [varchar](255) NULL,
    [ipsScanResult] [smallint] NULL,
    [ipsSignature] [nvarchar](128) NULL,
    [ThreatName] [varchar](255) NULL,
    [MalwareInspectionAction] [smallint] NULL,
    [MalwareInspectionResult] [smallint] NULL,
    [UrlCategory] [int] NULL,
    [MalwareInspectionContentDeliveryMethod] [smallint] NULL,
    [UagArrayId] [varchar](20) NULL,
    [UagVersion] [int] NULL,
    [UagModuleId] [varchar](20) NULL,
    [UagId] [int] NULL,
    [UagSeverity] [varchar](20) NULL,
    [UagType] [varchar](20) NULL,
    [UagEventName] [varchar](60) NULL,
    [UagSessionId] [varchar](50) NULL,
    [UagTrunkName] [varchar](128) NULL,
    [UagServiceName] [varchar](20) NULL,
    [UagErrorCode] [int] NULL,
    [MalwareInspectionDuration] [int] NULL,
    [MalwareInspectionThreatLevel] [smallint] NULL,
    [InternalServiceInfo] [int] NULL,
    [ipsApplicationProtocol] [nvarchar](128) NULL,
    [NATAddress] [uniqueidentifier] NULL,
    [UrlCategorizationReason] [smallint] NULL,
    [SessionType] [smallint] NULL,
    [UrlDestHost] [varchar](255) NULL,
    [SrcPort] [int] NULL,
    [SoftBlockAction] [nvarchar](128) NULL
) ON [PRIMARY]

TMG stores IPv4 and IPv6 addresses in the same field using as a UNIQUEIDENTIFIER. This means we have to parse them if we want to display a dotted quad, or at least we find a function that will do it for us.

Here’s my version:

CREATE FUNCTION [dbo].[ufn_GetIPv4Address]
    (
      @uidIP UNIQUEIDENTIFIER
    )
RETURNS NVARCHAR(20)
AS 
    BEGIN
        DECLARE @binIP VARBINARY(4)  
        DECLARE @h1 VARBINARY(1) 
        DECLARE @h2 VARBINARY(1) 
        DECLARE @h3 VARBINARY(1)
        DECLARE @h4 VARBINARY(1)
        DECLARE @strIP NVARCHAR(20)

        SELECT  @binIP = CONVERT(VARBINARY(4), @uidIP) 
        SELECT  @h1 = SUBSTRING(@binIP, 1, 1) 
        SELECT  @h2 = SUBSTRING(@binIP, 2, 1) 
        SELECT  @h3 = SUBSTRING(@binIP, 3, 1) 
        SELECT  @h4 = SUBSTRING(@binIP, 4, 1) 
        SELECT  @strIP = CONVERT(NVARCHAR(3), CONVERT(INT, @h4)) + '.'
                + CONVERT(NVARCHAR(3), CONVERT(INT, @h3)) + '.'
                + CONVERT(NVARCHAR(3), CONVERT(INT, @h2)) + '.'
                + CONVERT(NVARCHAR(3), CONVERT(INT, @h1))

        RETURN @strIP 
    END

Action values are stored as an integer so I’ve created a table to decode them:

CREATE TABLE [dbo].[Action](
    [Id] [int] NOT NULL,
    [Value] [varchar](50) NOT NULL,
    [String] [varchar](50) NOT NULL,
    [Description] [varchar](255) NOT NULL,
 CONSTRAINT [PK_Action] PRIMARY KEY CLUSTERED ([Id] ASC)
)

INSERT INTO Action
        ( Id, Value, String, Description )
VALUES
(0, 'NotLogged', '-', 'No action was logged.'),
(1, 'Bind', 'Bind', 'The Firewall service associated a local address with a socket.'),
(2, 'Listen', 'Listen', 'The Firewall service placed a socket in a state in which it listens for an incoming connection.'),
(3, 'GHBN', 'Gethostbyname', 'Get host by name request. The Firewall service retrieved host information corresponding to a host name.'),
(4, 'GHBA', 'gethostbyaddr', 'Get host by address request. The Firewall service retrieved host information corresponding to a network address.'),
(5, 'Redirect_Bind', 'Redirect Bind', 'The Firewall service enabled a connection using a local address associated with a socket.'),
(6, 'Establish', 'Initiated Connection', 'The Firewall service established a session.'),
(7, 'Terminate', 'Closed Connection', 'The Firewall service terminated a session.'),
(8, 'Denied', 'Denied Connection', 'The action requested was denied.'),
(9, 'Allowed', 'Allowed Connection', 'The action requested was allowed.'),
(10, 'Failed', 'Failed Connection Attempt', 'The action requested failed.'),
(11, 'Intermediate', 'Connection Status', 'The action was intermediate.'),
(12, 'Successful_Connection', '- Initiated VPN Connection', 'The Firewall service was successful in establishing a connection to a socket.'),
(13, 'Unsuccessful_Connection', 'Failed VPN Connection Attempt', 'The Firewall service was unsuccessful in establishing a connection to a socket.'),
(14, 'Disconnection', 'Closed VPN Connection', 'The Firewall service closed a connection on a socket.'),
(15, 'User_Cleared_Quarantine', 'User Cleared Quarantine', 'The Firewall service cleared a quarantined virtual private network (VPN) client.'),
(16, 'Quarantine_Timeout', 'Quarantine Timeout', 'The Firewall service disqualified a quarantined VPN client after the time-out period elapsed.')

The next step is to import the exported text files into the relevant table in SQL Server. Note that the SQL Server Import and Export Wizard has a default length of 50 characters for all character fields – that won’t be sufficient for much of the data. I used the schema as a reference to choose the correct lengths.

I can now look at the log data in ways that the log filtering can’t and without stressing the TMG servers.

-- Outbound traffic ordered by connections descending
SELECT ApplicationProtocol, A.String AS Action, COUNT(*) 
FROM FirewallLog FL
INNER JOIN Action A ON A.Id = FL.Action
WHERE DestinationNetwork = 'External' AND FL.Action NOT IN (7)
GROUP BY ApplicationProtocol, A.String
ORDER BY COUNT(*) DESC


-- Successful outbound traffic ordered by connections descending
SELECT ApplicationProtocol, A.String AS Action, COUNT(*) 
FROM FirewallLog FL
INNER JOIN Action A ON A.Id = FL.Action
WHERE DestinationNetwork = 'External' AND A.String IN ('Initiated Connection')
GROUP BY ApplicationProtocol, A.String
ORDER BY COUNT(*) DESC


-- Successful outbound traffic showing source and destination IP addresses, ordered by connections descending
SELECT  ApplicationProtocol ,
        dbo.ufn_GetIPv4Address(SourceIP) AS SourceIP ,
        dbo.ufn_GetIPv4Address(DestinationIP) AS DestinationIP ,
        COUNT(*) AS Count
FROM    FirewallLog
WHERE   ApplicationProtocol IN ( 'HTTP', 'HTTPS', 'FTP' )
        AND DestinationNetwork = 'External'
        AND Action NOT IN ( 7, 8, 10, 11 )
GROUP BY ApplicationProtocol ,
        dbo.ufn_GetIPv4Address(SourceIP) ,
        dbo.ufn_GetIPv4Address(DestinationIP)
ORDER BY COUNT(*) DESC

Microsoft Threat Management Gateway web farm publishing issue – “The remote server has been paused or is in the process of being started”

We’ve recently uncovered an issue with the way that I had configured web farm publishing in Microsoft Threat Management Gateway (TMG). When I say “we”, I include Microsoft Support who really got to the bottom of the problem. Of course, they’re in a privileged position as they have access to the source code of the product.

Perhaps I would have resolved it eventually. I’m thankful for MS support though. I didn’t find anything on the web to help me with this problem so, on the off chance it can help someone else, I thought I’d write it up.

The Symptoms

We’ve been switching our web publishing from Windows NLB to TMG web farms for balancing load to our IIS servers and we began seeing an intermittent issue. One minute we were successfully serving pages but the next minute, clients would receive an HTTP 500 error “The remote server has been paused or is in the process of being started” and a Failed Connection Attempt with HTTP Status Code 70 would appear in the TMG logs.

The issue would last for 30 to 60 seconds and then publishing would resume successfully. This would normally indicate that TMG has detected, using connectivity verifiers for the farm, that no servers are available to respond to requests. However, the servers appeared to be fine from the perspective of our monitoring system (behind the firewall) and for clients connecting in a different way (either over a VPN or via a TMG single-server publishing rule).

The (Wrong) Setup

Let’s say we have a pair of web servers, Web1 and Web2, protected from the Internet by TMG.

Each web server has a number of web sites in IIS, each bound to port 80 and a different host header. All of the host headers for a single web server map to the same internal IP address like this:

Host name	IP address
prod.admin.web1	172.16.0.1
prod.cms.web1	172.16.0.1
prod.static.web1	172.16.0.1
prod.admin.web2	172.16.0.2
prod.cms.web1	172.16.0.2
prod.static.web1	172.16.0.2

In reality, you should fully qualify the host name (e.g. prod.admin.web1.examplecorp.local) but I haven’t for this example.

I’ll assume that you know how to publish a web farm using TMG. We have a server farm configured for each web site with each web server configured like this (N.B. this is wrong as we’ll see later):

The benefit of this approach is that because we’ve specified the host header (prod.admin.web1) rather than just the server name (web1), we don’t have to specify the host header in the connectivity verifier:

This setup appears to work but under load, and as additional web sites and farm objects are added, our symptoms start to appear.

The Problem

So what was happening? TMG maintains open connections to the web servers which are part of the reverse-proxied requests from clients on the Internet. Despite the fact that all of host headers in the farm objects resolve to the same IP address, TMG compares them based on the host name and therefore they appear to be different. This means that TMG is opening and closing connections more often than it should.

The Solution

The solution is to specify the servers in the server farm object using the server host name and not the host header name. You have to do this for all farm objects that are using the same servers.

You then have to specify the host header in the connectivity verifier:

You could also use the IP address of the server. This is the configuration that Jason Jones recommends but I prefer the clarity of host name over IP address. I’m trusting that DNS will work as it should and won’t add much overhead. If you need support with TMG, Jason is excellent by the way.

Conclusion

Specifying the servers by host header name seemed logical to me. It was explicit and didn’t require that element of configuration to be hidden away in the connectivity verifier.

I switched from host header to IP address as part of testing but it didn’t fix our problem. It didn’t fix the problem because I only used IP addresses for a single farm object and not all of them.

Although TMG could identify open server connections based on IP address, it doesn’t. It uses host name. This has to be taken into account when configuring farm objects. In summary, if you’re using multiple server farm objects for the same servers, make sure you specify the server name consistently. Use IP address or an identical host name.