So here’s something that a.mused me the other day and should serve as a cautionary tale on the rise and limitations of AI. Out of the blue, my partner says that my blog had become inaccessible from her workplace.
She works in an educational establishment and not surprisingly they use a number of products to protect their users from the worst the Internet has to offer. On further investigation, it appears the site has been flagged as “suspicious” by a product called Lightspeed Web Filter. Very odd, I hadn’t thought my posts about puppies and running would warrant a red flag, it’s clearly a misunderstanding, maybe something lost in translation.
The AI had pulled out “relevant” tokens and created a Dave Gorman‘esque found poem that quite frankly leaves absolutely nothing to the imagination. I have to say my heart skipped a beat on this one and for a split second, I thought the site had been hacked and used for nefarious purposes. It skipped another beat when I saw the site had been classified as kids and teens and bizarrely business real estate. All I can see is KIDS and those keywords against my web site, I had visions of the police knocking down the door any minute. For a middle age man with high blood pressure, this is truly the stuff of nightmares. A few deep breaths later I started to piece together what had happened.
In defence of the poor AI that had the uneviable and tedious task of trawling my site, I have used every single one of these words at some point, along with a good selection of other words from the Oxford English dictionary. Context though is king, which unfortunately appears to be a lesson that this particular AI has yet to fully understand and I suspect it’s not alone. Just to be clear this isn’t a pop at Lightspeed, they have a good product that protects their customers for the most part as intended.
There has been a war of words going on for decades now and it started back before the world wide web in the bulletin boards, mail and Usenet groups. As the barriers to communication were removed, spam filters using crudely crafted rules had the challenge of trying to determine if a post/mail was the real thing or a cunningly manufactured fake, usually offering the promise of wealth beyond imagination or Viagra or both if you were really lucky.
Over the years the rules became ever more complex and the ways of defeating them even more ingenious. Businesses offering web site filtering services now have to review all the content on a site, which in many cases will be varied and contain language that can be domain specific. It’s an incredibly difficult task to solve, especially when those who would circumvent it are one step behind.
I forgive the AI its limitations, after all, it was created by imperfect humans. My only suggestion to the developers of these services is to please ensure your AI isn’t adding to the problem by creating its own risque content, especially when it’s on a website that ironically isn’t going to get filtered (manually categorised ;-).
Needless to say, I used the “Submit for Review” button on Lightspeed’s website and the team promptly reviewed the content and classified my site as “forums.personal” which is great as the business real estate inquiries were becoming a nuisance (joke). They have however still left the AI’s keyword poetry which I’d suggest is far more scandalous than any of the content on my site. I’m guessing that dictionary.com is whitelisted for fear of creating a work that would make even the Marquis de Sade blush.
I must admit I did test it on a few other blogging sites that I subscribe to and probably a 3rd of them had similar amusing results. If you are an amateur blogger and you want a laugh (and aren’t easily offended) you can check your entry below. Simply change mused.blog for your domain.
Feel free to post the best ones as comments, maybe we’ll hand out an award, or publish a book of computer pornographic poetry, god only knows who would own the copyright.
With so much science fiction providing cautionary tales of how AI will take over the world, I’m both reassured and concerned. Reassured because it’s clear a T-800 won’t be knocking down my door any time soon unless it’s to regale me with prose. On the other hand, an AI designed to remove smut has circumvented its designer’s narrow design goals and instead of filtering unwanted content has started generating its own.
A taste of things to come maybe? Are our creations to undermine and outsmart us on every task we set them? For sure Asimov didn’t envisage this eventuality when he wrote his rules of robotics and it’s probably a bit too early to start debating the free speech rights of AI.
February 2019 Update
Since I originally wrote this article (back in Sept 2018), I’ve occasionally checked back to see what impact new posts might have on the AI’s creative output. I wasn’t disappointed Feb 2019 relevant tokens output is worthy of 50 shades:
I’ll admit it, I had to look up if “join key” was some kind of sexual innuendo. It does, however, seem to be innocent enough so I can only assume the AI is trying to appeal to database developers/admins?
July 2019 Update
Looks like the team over at Lightspeed have spotted the issue and updated the frontend so it doesn’t show the “relevant tokens” on the public page any more. Good job, it’s for the greater good, but I will secretly miss the output of my Tourette friend.