Now with 68% more manliness! Or something.
It is honestly one of the wonders of the information age, the way that you can follow a link here and a link there and end up in all sorts of strange Interplaces. Today I was reading a thread at Pharyngula about creationist site Answers in Genesis using history-sniffing scripts to see what other sites you’ve visited (a trick originally developed by porn sites trying to see if you were also visiting their competitors. You’re safe if you’re on Opera, Chrome, or Safari, and the Firefox loophole was closed a few versions back, so if you’re all upgraded you should be fine) and followed a link someone left in the comments about other uses of the script, which brought me to this old post at Mike On Ads:
One of the things that I always wanted to do but never got around to was to analyze a user’s browsing history to estimate age and gender. Of course the idea is definitely not new, in fact Xerox (of all companies??) has a patent on the whole process and I’m certain plenty of networks already do something of the sort… but what the heck, let’s have some fun!
So what I did is I modified the SocialHistory JS so that it polled the browser to find out which of the Quantcast top 10k sites were visited. I then apply the ratio of male to female users for each site and with some basic math determine a guestimate of your gender.
He built a widget that looks at your browsing history and tries to guess what sex you are based on the userbase of those sites – as in, someone who visits a lot of sites whose readership is heavily male is statistically more likely to be male, and so on. (The page itself goes into more detail on the maths.)
I am less than thrilled with the persistent use of the word ‘gender’ where he means ‘sex’; it is also a shame that all the Quantcast data is presented strictly in binary terms, male/female. (I doubt that the inclusion of genderqueer/non-binary people would alter the data very much – currently, at least, they’re in a tiny minority; but that’s not really the point, you know? The point is to not randomly deny people’s existence.) Working with what there is, though, still turns up some interesting tidbits.
Here are the relevant sites from my browser history, copied-and-pasted from the widget:
(For comparison purposes, Quantcast starts classing site readerships as “heavily male” around the 1.85 mark, which is 65/35, and as “heavily female” at around 0.53, which is 35/65. In the first few hundred sites listed, the heaviest skew is pogo.com, which is 71% female for a score of o.40.)
And here is what it decided, based on those sites:
Likelihood of you being FEMALE is 32%
Likelihood of you being MALE is 68%
So there you have it: more than twice as likely to be male. (Who knew?)
The thing is, I think a human searching my browsing history would be able to conclude in a matter of minutes that I’m a woman. My full history is fairly mixed: on the one hand there’s a ton of ladybusiness in there – Feministe, Tiger Beatdown, Girls With Slingshots, Pandagon. On the other there’s the Guardian, Doonesbury, xkcd, and Pharyngula, which likely skew heavily male. I suspect that a person presented with that mishmash – that is, a person whose guesses are going to be informed by cultural attitudes, as contrasted with a computer which happily/sadly does not know sexism exists – would interpret that kind of mix as “woman interested in masculine-coded things” rather than “man interested in feminine-coded things”, because the former is a lot more socially acceptable than the latter.
Obviously a lot of my favourite blog destinations are, if not small in absolute terms, definitely low-traffic next to Google.com and its 162 million US users a month, and so don’t show up on the Quantcast 10K list from which the script draws. That said, I’m still interested in the data it did get; I have, after all, been to all these places in my recent perambulations of the web, and I’m intrigued by some of their stats.
Firstly, Google. Being a Brit, it’s Google.co.uk that’s my default destination; the only reason Google.com is showing on my history at all is because I wanted to see if they’d put the St Andrews’ Day doodle up on any versions other than the UK one (they hadn’t.) But look at those stats. Google US has a male-to-female ratio of 0.98 – not much to talk about; but Google UK’s ratio is 1.35, a distinct male skew. I am entirely baffled by this. Do British women just not use the Google? I AM CONFUSED.
Interesting that Facebook and WordPress both skew very slightly female. File it under the continuing categorisation of both socialisation and personal journaling habits as ladyish things to do. Would be interesting to see how others compare. (A start: according to Quantcast, Tumblr is 51/49 men to women – 1.04; Livejournal is estimated at 42/58 – 0.72.)
Can’t say I’m surprised that the Guardian skews male in its readership given how much of a Dude Paper they can occasionally be. Better about it than pretty much all the others in the country, I hasten to add, but still; the old boys’ network in newspaper publishing is going very strong.
Not especially surprised by Ebay, either.
Behindthename.com is a baby names site; I use it most often for character names, so with NaNoWriMo just gone past I’ve been consulting it fairly frequently. Given that All Things Baby are still constructed as the province of women, a 0.77 man/woman ratio seems about right.
And now the big hitters: NASA and Kongregate, look at you with your 1.38 and your 1.41. I was on NASA looking up this business about arsenic-using bacteria (not as exciting as I’d hoped, but still kind of cool) and Kongregate is my go-to site when I need a mindless Flash game to help cool down my brain.
For comparison, J was told he was 9% likely to be female and 91% likely to be male. A strange thing from his list was the differing profiles of two very similar Flash game sites: Kongregate, as previously noted, scores 1.38, but Armor Games (which hosts many of the same titles) only 0.89. The standout figure on his list was the Escapist website – an online gaming magazine – which has a score of 2.08, or approximately 67/33; twice as many male readers as female. He informs me, having done some maths, that omitting the Escapist from the list more than doubles the likelihood of his being female, according to the algorithm – a jump from 9% to 20%. (Clearly it’s just uber-manly.)
I would be very interested to know how other people score, and how the percentages the algorithm throws up correspond – or fail to correspond – to your actual sex. (I would expect a fair amount of hilarious inaccuracy, especially as it doesn’t weight links according to how many times you visit them.) Also any interesting/surprising figures it produces with regard to the sex ratio on particular sites. Anyone? Here’s the link again if you do want to join in.