Tracking Processors: Client Classification Processor

The client classification processor is designed to detect popular legitimate search engine bots. These types of bots are notorious for performing aggressive spider activity on websites, and often this activity can trigger security related incidents. Using this processor to define the conditions used to identify such bots, allows the system to ignore security incidents from those clients. This will remove search engine related false positives, as well as prevent errors in indexed and cached results. The popular search engines are included by default, but if additional search engines should be allowed, new rules can be created. Be careful not to define a rule that will match clients other than the targeted search engine bot. The less specific the conditions of a rule, the easier it will be for an attacker to spoof the search engine and circumvent detection. It is critical that DNS be enabled on WebApp Secure to achieve effective classification of search engines. Not enabling DNS and leaving this processor turned on, can result in some attackers not being identified.

If a client is classified as a search engine based on one of the defined rules, then that client will not be able to generate incidents, and additionally:

This is done to ensure that the results cached by the search engine bot do not include fake code that can change in the future, and thus end up flagging clients who are following legitimate search engine links. Classification rules are made up of a series of patterns to run against various attributes of the client:

At least one pattern must be specified on at least one attribute, however you can specify patterns for as many attributes as the bot will allow. For example, if the bot changes its IP address constantly, then you should not define a pattern for the IP. However if the hostname always ends in google.com, then a pattern of [.]google[.]com$ could be assigned to the “Hostname” attribute. If the user agent always contains “googlebot”, then “googlebot” could be assigned as the user agent pattern. Here is an example of a complete pattern for the Googlebot search engine spider:

Hostname Pattern: [.]google(bot)?[.]com$User Agent Pattern: (adsbot.google|googlebot|Google[ ]Web[ ]Preview|Mediapartners-Google)Country Pattern: USRegion Pattern: (California|Georgia)

Note: It would be extremely difficult for an attacker to spoof values for all of those attributes which would match the patterns. For example, spoofing the reverse DNS lookup to end in “.google.com” would require serious effort, and would require insecure DNS configuration on behalf of the WebApp Secure administrator. Ideally every rule should include either an “ip” or “hostname” pattern.

Table 29: Client Classification Configuration Parameters

Parameter

Type

Default Value

Description

Basic

Processor Enabled

Boolean

False

Whether traffic should be passed through this processor.

Classification Rules

Client Type

String

(none)

The name of the type of client being identified.

IP Pattern

String

(none)

The IP address pattern to require (if any).

Hostname Pattern

String

(none)

The hostname pattern to require (if any) if DNS is enabled.

User Agent Pattern

String

(none)

The user agent pattern to require (if any).

Country Pattern

String

(none)

The country pattern to require (if any).

City Pattern

String

(none)

The city pattern to require (if any).

Region Pattern

String

(none)

The region pattern to require (if any).

Header Name Pattern

String

(none)

A pattern used to identify a required header name (if any).

Header Value Pattern

String

(none)

A pattern used to verify the value of a header that matches the header name pattern (if any).