Tracking Processors: Client Classification Processor
The client classification processor is designed to detect popular legitimate search engine bots. These types of bots are notorious for performing aggressive spider activity on websites, and often this activity can trigger security related incidents. Using this processor to define the conditions used to identify such bots, allows the system to ignore security incidents from those clients. This will remove search engine related false positives, as well as prevent errors in indexed and cached results. The popular search engines are included by default, but if additional search engines should be allowed, new rules can be created. Be careful not to define a rule that will match clients other than the targeted search engine bot. The less specific the conditions of a rule, the easier it will be for an attacker to spoof the search engine and circumvent detection. It is critical that DNS be enabled on WebApp Secure to achieve effective classification of search engines. Not enabling DNS and leaving this processor turned on, can result in some attackers not being identified.
If a client is classified as a search engine based on one of the defined rules, then that client will not be able to generate incidents, and additionally:
- Query String Processor will be turned off for that user (no query param injections)
- Hidden Link Processor will be turned off for that user (no hidden link injections)
This is done to ensure that the results cached by the search engine bot do not include fake code that can change in the future, and thus end up flagging clients who are following legitimate search engine links. Classification rules are made up of a series of patterns to run against various attributes of the client:
- IP Address
- Hostname
- User Agent
- Country Code
- City
- Region
- Header Name and Value
At least one pattern must be specified on at least one attribute, however you can specify patterns for as many attributes as the bot will allow. For example, if the bot changes its IP address constantly, then you should not define a pattern for the IP. However if the hostname always ends in google.com, then a pattern of [.]google[.]com$ could be assigned to the “Hostname” attribute. If the user agent always contains “googlebot”, then “googlebot” could be assigned as the user agent pattern. Here is an example of a complete pattern for the Googlebot search engine spider:
![]() | Note: It would be extremely difficult for an attacker to spoof values for all of those attributes which would match the patterns. For example, spoofing the reverse DNS lookup to end in “.google.com” would require serious effort, and would require insecure DNS configuration on behalf of the WebApp Secure administrator. Ideally every rule should include either an “ip” or “hostname” pattern. |
Table 29: Client Classification Configuration Parameters
Parameter | Type | Default Value | Description |
---|---|---|---|
Basic | |||
Processor Enabled | Boolean | False | Whether traffic should be passed through this processor. |
Classification Rules | |||
Client Type | String | (none) | The name of the type of client being identified. |
IP Pattern | String | (none) | The IP address pattern to require (if any). |
Hostname Pattern | String | (none) | The hostname pattern to require (if any) if DNS is enabled. |
User Agent Pattern | String | (none) | The user agent pattern to require (if any). |
Country Pattern | String | (none) | The country pattern to require (if any). |
City Pattern | String | (none) | The city pattern to require (if any). |
Region Pattern | String | (none) | The region pattern to require (if any). |
Header Name Pattern | String | (none) | A pattern used to identify a required header name (if any). |
Header Value Pattern | String | (none) | A pattern used to verify the value of a header that matches the header name pattern (if any). |