-
Notifications
You must be signed in to change notification settings - Fork 0
Specification
Wappalyzer uses a long list of regular expressions to evaluate web pages and detect web applications. The list is located at src/apps.json
.
"Application Name": {
"website": "example.com",
"cats": [ 1 ],
"headers": { "X-Powered-By": "Application Name" },
"url": ".+\\.application-name\\.com",
"html": "<link[^>]application-name\\.css",
"meta": { "generator": [ "Application Name", "Alternative Application Name" ] },
"script": "application-name-([0-9.]+)\\.js\\;confidence:50\\;version:\\1",
"env": "ApplicationName",
"implies": "PHP\\;confidence:50",
"excludes": "Other Application Name"
}
Refer to the JSON schema.
field | type | description | Example |
---|---|---|---|
website | string | URL of the application's website, with the protocol left off. | "example.com" |
cats | array | List of category IDs. See apps.json for the complete list. | [ 1, 6 ] |
env | array / string | Global JavaScript variables, e.g. jQuery .Note that this will only detect top-level variables; e.g. the following will not work: ^jQuery\\.fooBar$ . |
"^jQuery$" |
headers | object | HTTP Response headers, e.g. X-Powered-By . |
{ "X-Powered-By": "^Hello(?:World|Universe)" } |
html | array / string | Full HTML response body. | "<a [^>]*href=\"[^\"]+/foo/bar" |
implies | array / string | The presence of one application can imply the presence of another, e.g. Drupal means PHP is also in use. | [ "PHP", "jQuery" ] |
excludes | array / string | Opposite of implies. The presence of one application can exclude the presence of another. | "Apache" |
url | array / string | URL of the page, e.g. http://wordpress.com/index.php . |
"/cart/checkout\\?(?:.*&)?shopname_sess=" |
meta | object | HTML meta tags, e.g. generator . |
{ "generator": "^Hello(?:World|Universe)" } |
script | array / string |
src attribute of HTML script tags, e.g. jquery.js . |
"my_js_lib\\.js" |
Except cats
and website
all fields are optional and accept one or more patterns (either a string or an array of regular expressions).
Patterns are case insensitive regular expressions. No surrounding delimiters or flags are used. Note that escape characters need to be escaped themselves for the JSON to be valid. Slashes (/
) do not need to be escaped.
Optional fields may be appended, separated by \\;
:
field | description |
---|---|
confidence | Indicates less reliable patterns that may cause false positives. The aim is to achieve a combined confidence of 100%. Defaults to 100% for unspecified fields. |
version | Gets the version number from a pattern match using a special syntax. |
The confidence field may also be applied to the implied
field. The implied
confidence is multiplied by confidence of the pattern that identified the original application.
example | description |
---|---|
\\1 |
Returns the first match |
\\1?a: |
Returns a if the first match contains a value, nothing otherwise |
\\1?a:b |
Returns a if the first match contains a value, b otherwise |
\\1?:b |
Returns nothing if the first match contains a value, b otherwise |
foo\\1 |
Returns foo with the first match appended |
When adding patterns, make sure you think of the following:
####Unnecessary use of capturing groups
Regular expressions can have capturing groups, e.g. (bar)
.
#####Problem
The text matched by the group is stored for later use, which makes it slower than non-capturing groups.
#####Solution
Use non-capturing groups (e.g. (?:bar)
) or no group at all (e.g. bar
) where appropriate.
Patterns should only match web pages that are actually using the app. #####Problem
- All environment variables up to and including three characters may occur in JavaScript generated by the Google Closure Compiler.
- Too generic HTML patterns may match plain text.
- Etc…
#####Solution
- Make sure the things you are trying to match are sufficiently long and not too generic.
- Make sure your patterns won't match plain HTML.
- If your pattern is somewhat generic but there is a good reason to include it, you can give it a lower confidence percentage; e.g.
\\;confidence:20
. Confidence for multiple patterns is added together, so you can have multiple lower-confidence patterns which together give a high confidence.