How to monitor keywords
27 September 2017
Keyword monitoring needed improvement. Recently I was confronted with a possible dataleak and wanted to monitor a few specific names and phrases in the news.
The way it worked before was that the bots checked for the word with a space before or after it in the text on newsites and copy-pastesites:
if " "+keyword in text or keyword+" " in text:
While it does a reasonable job of not matching things like "blakeywordbla", it had way to much false positives. I needed more finetuning and ended up implementing literal matches, multiple keyword matches and negative keywords.
Literal matches: "the keyword"
It works just like people are used to in Google searches. The exact words need to be present in the order specified. It triggers on
"This is the keyword you are looking for", but not on
"this keyword is not the one you are looking for".
Multiple matches: the keyword
Both "the" and "keyword" need to be present in the text in order to trigger, but the order is not important. They can easily be sentences apart, like in:
"This keyword is what you want. The other words are ignored."
Negative matches: keyword -the
Again like in Google searches, any text with the word "keyword" in which "the" does not appear at all is a match. For example:
"The keyword is not enough" doesn't match, and
"Keyword matches are very useful for finding leaked data" does.
Of course you can mix all of the above. Do note that all keyword matches are case insensitive, and that news articles from the newsfeeds that we monitor trigger both on keyword matches in the headline and the text. Datadumpsites often don't have a title and matches are on keywords in the dump itself. Here are some keyword combinations to get your creativity started:
shadowtrackr leak -water
@shadowtrackr.com
shadowtrackr password
"Tracking your online footprint"
You can add them under
Assets in the sidemenu . Happy keyword hunting! And don't forget to set push notifications to get a heads up on those really bad days.
Open for business
08 September 2017
I was happily developing new functionality when things started to get out of hand. More registrations and data came in than I or the the system could handle. If this was to continue I would run out of resources pretty fast, so I had to stop receiving free beta users.
To receive new users and add new functionality I need more resources, and for that I need funding. I decided to bootstrap it and open paid subscriptions. This should insure that the funding grows in line with the need for resources, on the one condition that I set the right prices. And there is the thing. I know what I need right now with the current users, but I can't possible know how many users will keep coming, if costs per user will stay the same and what I'll need for all the extra functionality I have in mind.
So I took a guess and created three subscription plans. I'll try to keep the prices as low as I can and of course I'll listen to the wishes of paying customers first. The current beta users can all keep their free beta account, as a thank you for being a beta user.
Signing up for a paid subscription can be done
here Why deleting stuff is hard
21 August 2017
To demonstrate some functionality I added a random newsite to an account with about 50 urls. Within minutes, the ShadowTrackr had found hosts, related urls, hosts for the related urls, certificates, and more. It was all fine until I noticed the lack of a delete button. I naively implemented a delete button for the urls and hosts under settings and clicked it. The random newsite was gone. And within a few minutes appeared again. Since the related host and some subdomains were still in the system, the pay level domain was easily found again and automatically added.
Against better judgement, I manually deleted the hosts and subdomains and quickly deleted the url. Again, within minutes all reappeared. It was even worse than the situation than I started with: you can only delete the url's and hosts you add manually and the orginal url now appeared as a related asset found by the system (without a delete button). Adding delete buttons for related assets is useless, since they are related and will always be rediscovered. It turns out deleting an asset was much harder than I thought, so I tried putting the issue on my todo list and started working on other stuff. I just couldn't figure out what the proper delete implementation should be.
Off course, users will notice a problem like this and start complaining (as they should). I had to implement a way to delete assets, but I couldn't decide how it should work. Should I blindly delete all related assets? Including ones that might be shared with other urls or hosts? Should related messages be deleted from the timeline too? That would mean that you might miss historical data on an attack targeted at you just because your server changed its ip address.
Since I can't come up with an implementation that works for all users in all circumstances, there is now a delete button with two checkboxes. One is for deleting related messages from the timeline, and the other is for aggresively deleting related assets. It might be a bit too aggresive and delete the shared server that also hosts your other websites, but I figured (and tested) these will be found again from related assets. I expect the solution with the optional checkboxes will work for everyone, but please let me know if you have problems.