ShadowTrackr

Log in >
RSS feed

Website scanning in-depth

19 April 2020
Scanning a website seems easy, and it is if you just do a one-off, single scan for a url.

Things get more interesting when you host your website on multiple servers (for better performance or reliability). You probably also have both ipv4 and ipv6 addresses available. Your website runs on HTTPS, and you want your visitors to be able to find you without typing in the protocol too. So you also have HTTP configured. That’s two protocol versions, on two versions of ip addresses and maybe multiple hosts.

Some websites run in the cloud. You can limit this to a specific cloud in a specific country (which most governments do with their websites), but you can also have the cloud provider figure out what the best spot is. If you do this with Azure, your website will get served from the nearest Azure cloud. ShadowTrackr has nodes all over the world, and this means we’ll be able to detect your website in multiple clouds. That’s on purpose of course, but it does complicate things.

Then there are CDNs like Cloudflare and Akamai. You host your website on a server where the CDN can reach you, and they handle all your visitor requests. You’ll need a trick to point your visitors at the CDN of course, and this is where it gets ugly for scanners. There are multiple ways of doing this and these can be mixed and matched. On top of this some CDN hire subcontractors that are really hard to attribute and you might end up detecting Vodafone instead of Akamai.

It seemed so easy to scan a website, but in practice it can get really complex. The goal has always been for ShadowTrackr to detect all your website instances on all internet-reachable hosts, including clouds and CDNs. I had underestimated how complex this is and did not achieve the goal from the start. After getting it wrong a couple of times, this week’s update features a much improved algorithm. This might result in a storm of new websites being found on your timeline. I’m on it and regularly clean things up until they are all properly ingested and monitored.

If you do find irregularities, or have any other questions, drop me a line.
Older posts >

Resources
API
Blog
Documentation
Integrations
Shodan
OpenCTI