Results tagged “statistics”

Your Twitter Ranking Article Is Wrong

October 22, 2010

Here are some articles that have recently gotten attention amongst media obsessives. They are all fundamentally flawed:

The problem with all of these pieces? The data that underlie the assertions are fundamentally flawed.

Each story uses the advanced research technique of looking at a publication's Twitter account, then reading the sidebar of their Twitter profile and copying the number of followers listed there. This methodology is useless for determining how many people have chosen to follow a publication, and instead is indicative primarily of whether or not that publication is one of the suggested Twitter accounts that users encounter when signing up for the service. It's also correlated to how long that publication has been on Twitter accruing those incidental followers.

Big Follower Counts Are Horseshit

I covered much of this topic at the beginning of this year in a post called Nobody Has A Million Twitter Followers. While the literal point of that headline may no longer be true (I'm sure Justin Bieber or Nicki Minaj has actually earned a million organic Twitter followers), the point still stands: Being suggested as an account to follow when users sign up for Twitter so distorts the meaning of follower counts that citing such follower counts without disclaimers is either ignorant or misleading.

In the case of screaming headlines that say "The New York Times has more Twitter followers than subscribers!" we actually veer from misleading to so distorted it's absurd. Subscribers are people who have, in one way or another, indicated intent. They filled out a form, sent in some money, and established a relationship with The New York Times. The majority of followers of the New York Times on Twitter, however, only established a relationship with Twitter itself, and the Times came along for the ride. MediaWeek actually uses the headline "Elle has a hit with Twitter feed" and this cannot be proven — being on the list myself, I gain users at almost the same rate as Elle UK, and I'm no hit among fashionistas. All we're getting a measure of is Twitter's popularity.

If any of these articles included explanation of the fact that the publications with the biggest number of followers were merely those chosen by Twitter to be so, then we could start to have an honest discussion about impact or influence or popularity or whatever the hell it is these writers want to weigh in on. By analogy, if a publisher went and threw its paper on the doorsteps of millions of people without any conscious action on their part, and then crowed about how it had a bigger subscription base than someone else, we'd consider them ridiculous.

So statements like "Maybe The New York Times has such a huge Twitter following because it was the first of the Top 25 to join Twitter, way back in March 2, 2007. " (from the first article linked above, on Journalistics) show a fundamental misunderstanding of the very numbers they're trying to report on. If we're going to make a splash with articles based on numbers, let's at least pretend to know what the numbers represent.

Post-Crime NYC

January 17, 2008

The other day, I'd been reminded about some of the most striking statistics I'd seen last year, which were from the NYPD crime stats for 9th Precinct, where I live. (That link is to a PDF with stats for last week.) Each precinct in the city files reports every week, and those reports also include comparisons of statistics for prior years.

But what's amazing is the trends in violent crime shown over the past 20 years. CompStat reports show the numbers from 1990 until 2006, and over that time, rapes are down 70% from 41 to 12. Burglaries are down 85%, from 1420 to 209. And murders? There weren't any. In my neighborhood, people don't kill each other. In 1990, they did, 23 times. Robberies over the same timeframe are down 81%, and felony assaults are down 69%. And all of this in a neighborhood where, just a year before they started tracking these stats, we had a police-incited riot that divided the entire neighborhood. Today, there's a dog run and a kids' playground just steps from where the riot began.

Now, of course, that's no consolation to the people who've still suffered from the crimes that do go on, and of course it doesn't account for other precincts where crime is worse. But the fundamental character of what it means to live here is so incredibly different from the perception that so many outsiders have of what it means to live in New York City. You will always have some violent crime -- an overwhelming majority of the personal violence that does happen could fall under the description of crimes of passion, people beating up their romantic rivals or things like that. But the day-to-day threat of random street violence is measurably, fundamentally reduced. Along with the massive improvements made to so many parks across all five boroughs, we are truly in a golden age for public space in New York. These numbers represent just one part of that, but it's an important part.

More from the New York Daily News, and detailed city-wide crime reports going back to 1960 are available here. Choire is also blogging about many of the same topics in his guest posts on kottke.org today.

Massaging the Data

July 19, 2006

Speaking of memes from a year ago, last year I created a site called ishavingamassage.com. (That's "Is Having A Massage", not "I Shaving...") The domain is a (gentle) poke at Flickr, which uses the message "Flickr is having a massage." as its error/downtime message when the service is taken offline for repairs or maintenance.

The founders and several of the members of the Flickr team are friends of mine, so it wasn't intended by any means to be a dig at the site. (Except maybe for being so lighthearted and cheerful while so many Flickr addicts are panickedly hitting "refresh".)

At any rate, the massage site had a nice little run for a few months. It acted as a goofy inline link for people to use when making a point in a blog post, or as a little toy for people who like to kill downtime at work by typing in different URLs and seeing what happens. You know, something like yo.momma. ishavingamassage.com.

How it works

For those who've never tried it, the behavior is simple. You type in your.site.ishavingamassage.com into your browser, and it displays a custom Massage Message, coloring the text of the your.site part of the address Flickr-style, converting any dots in the URL to spaces, and removing a penultimate "e" character if the last letter of the site name is an "r". Not rocket science, but it amused me for the hour or so it took to build.

On a lark, I decided to log the massages after the first hour or so that the site was running. I didn't keep track of timestamps or the IP addresses of people who accessed the site or anything else that might start people fussing about privacy, etc. If I'd have planned ahead, I probably would have thought more about that.

Anyway, the amount of analysis and actual understanding of user behavior that I can do is limited. What becomes clear is that some popular sites really encourage people to click on links, and others that seem equally popular are mostly frequented by people who are less active clickers. Note, I also special-cased one or two websites where the site owners took down their websites entirely and redirected all of their traffic to sitename.ishavingamassage.com. Those sites were bounced to google.com and the requests weren't logged, due to volume. Those are removed from the data set.

The Data

Thanks to Ben, I was able to crunch the numbers a bit about what things people were massaging. The Top 10:

  1. kottke: 6843
  2. flickr: 6412
  3. jasmeet: 5187
  4. yanni: 3422
  5. upcoming: 3012
  6. my wallet: 2065
  7. aelki: 1831
  8. mathowie: 1495
  9. brice: 888
  10. arvind: 854

If you're looking for raw data, I've got the log file here: massages.txt (750k plain text file). We've also got the raw data of counts, as an Excel file. massages.xls (839k Excel spreadsheet). All of this data is from approximately one month ago; There's a live data feed, but I'll link to that later.

Massages Data

The bottom line

So, what conclusions can we draw? Jason Kottke is a very popular blogger. And the audience that responds to this kind of web wankery has a fairly high percentage of people who like to try at least some primitive-level hacking. There were a surprising (to me, at least) number of people trying to escape characters or add commands to the script that runs the page, along with a healthy number of people who just wanted to mess up the HTML on the page. There are also a surprising number of people who want to redirect all their traffic to another site for at least a temporary period of time.

Not surprising? Lots of people like to talk about body parts belonging either to themselves, their friends, or various people's mothers.

Some items that might be of interest:

  • Flickr has far fewer massages these days; To understand why, please see Cal Henderson's Building Scalable Web Sites. Cal's on the Flickr team, and reveals a lot of his secrets here.
  • Rafe Colburn says "Log it, don't count it". He's completely correct.
  • The massage site didn't originally have ads on it. I put them on now since I believe you should start paying rent after you graduate.
  • I got through this whole post without mentioning power laws or the Long Tail! Aw, crap.
1