The making of the NYT’s Netflix graphic
January 20th, 2010
One of The Times’ recent graphics, “A Peek Into Netflix Queues,” ended up being one of our more popular graphics of the past few months. (A good roundup of what people wrote is here). Since then, there have been a few questions about the how the graphic was made and Tyson Evans, a friend and colleague, thought it might interest SND members. (I bother Tyson with questions about CSS and Ruby pretty regularly, so I owe him a few favors.)
Most readers are probably interested in the interactive graphic, although I will say that we also ran a lovely full-page graphic in print in the Metropolitan section, which goes out to readers in the New York region. That graphic had a lot of interesting statistical analysis – in fact, it would have been nice to get some analysis in the web version, more on that later – but for this I will focus mostly on the web version. If there are questions about the print graphic, I will make sure I get Amanda Cox to try to explain cluster analysis to me again.
First is the data itself. Jo Craven McGinty, a CAR reporter, was in contact with Netflix to obtain a database of the top 50 movies in each ZIP code for every ZIP in the country. That’s about 1.9 million records. The database did not include the number of people renting the movie – just the rank. (We would have loved to have it, but Netflix said no. Understandably, it would have given competitors a perfect map of their market penetration.) The raw data looked like this:
We decided to focus on cities, rather than the nation as a whole, for a few reasons.
First: Most of the interesting trends occurred on a local scale – stark differences between the South Bronx and Lower Manhattan, for example, or the east and west sides of D.C. – and weren’t particularly telling at a national scale. (We actually generated U.S. maps in PDF form that showed all 35,000 or so ZIPs, but when we flipped through them, with a few exceptions, we found the nationwide patterns weren’t nearly as interesting as the close-in views.)
Second: Matthew Bloch’s mapping framework is highly optimized, but it’s not necessarily equipped to handle changing 35,000+ polygons between 100 different movies as fast as would be necessary – no one likes to use a scrubber that’s slow to react.
One solution to the too many polygons problem is scaling up the data to wider geographies, such as one based on the first 3 digits of ZIP codes. But in this case, we couldn’t do that because we didn’t have the total number of renters in each ZIP — we only had the rank.
So, we decided on a dozen cities, determined mostly by population but also geographic distribution, which is why Minneapolis, Seattle, Denver and San Francisco are on the map, but not Houston or Philadelphia. (This apparent injustice was not lost on commenters from those cities.) We made individual GIS shapefiles of each city, then merged them into a single shapefile using ArcView’s ‘merge’ tool.
This reduced the number of shapes down to about 5,000 or so, well within Matthew Bloch’s “still super fast” threshold.
Still, the hardest part about this graphic was designing the interface. We wanted readers to be able to find a given movie quickly, but a search box didn’t really work visually. We also wanted to give readers an idea which movies were most popular and which were most critically acclaimed.
I mocked up at least ten versions. None were any good. The challenge was navigation. As a user, I wanted to be able to see one movie in a bunch of different cities, fast, or I wanted to see a bunch of movies in one city just as fast. So there are two major navigation elements – cities and movies – but the map itself still needed to be the visual focal point of the graphic.
In the end, graphics director Steve Duenes and deputy Matt Ericson came up with a sketch based on elements of my previous mockups:
which I turned into a more refined Illustrator mockup:
We tweaked this until it resembled what’s now online. It’s a complicated interface, but I don’t know if it could be any simpler.
Once we settled on a design, there was still a lot of work to do. We needed to get all the movie thumbnail images, the Metacritic ‘Metascores’, the links to The Times’ reviews and the first few sentences of the reviews themselves. We did this mostly by writing scripts. Both Metacritic and Netflix have great search-engine optimization, so just Googling a film title with the word ‘Netflix’ or ‘Metacritic’ generally gives you what you want in the first search entry:
We wrote a Ruby script that parsed the Google search results page for each movie, which typically contained the Metacritic score and Netflix ID. We used hpricot, a HTML-parsing Ruby plugin to pull out the Netflix ID and Metascore of any film.With the Netflix ID, we know the link to the thumbnail image is
"cdn-0.nflximg.com/us/boxshots/large/" + netflix_movie_id + ".jpg"
We used a similar technique to fetch links and content of The Times’ review, and then filled in any missing movies by looking the information up by hand.
As for the making of the map itself, the concept is very simple. For any movie, each ZIP code is assigned a color based on its rank. If it’s not in the top 50, it’s not shaded. That’s about it. To optimize the map, Matthew Bloch did a bit of database work, giving each movie title a numerical ID instead of using its full title, since it’s faster to parse through numbers than text.
The result was the graphic that’s online now. We were able to get a lot in, but we still had to leave a lot out, such as different ways to shade the maps other than by movie (i.e. where people rented movies that were nominated for Best Picture or shading each ZIP code based on a calculation of the Metacritic ratings for its top 50 movies, which we did in print for the New York region), but it would have made the interface even more complicated.
Don’t get me wrong – leaving things out is critical in interactive graphics, where the default temptation is to dump all the data you have behind an interface. It’s hard to say no to that, because readers are going to find a lot of things with raw data that you might have missed. (Such as the interesting island of Andrews Air Force Base).
It’s something we know we can do better; I don’t think anyone would disagree that tidbits of analysis are usually more meaningful than massive streams of raw data. It’s nice to get both in if you can. We’re working on it.
Kevin Quealy has been a graphics editor at the New York Times for almost two years. He has a Master’s degree from the Missouri School of Journalism and a Bachelor’s degree in Physics from Gustavus Adolphus College. He has previously worked at the Philadelphia Inquirer and the St. Louis Post-Dispatch.









Jennifer said:
Thanks for sharing the process. Can you provide the timeline for this project?
jim said:
Very cool! I, too, would love to know how long it took to make this…
Thanks for sharing.
Joy Mayer said:
Kevin, you’ll be interested to know that this will be required reading for my new Multimedia Planning and Design class. You’re the expert grownup now!
Bob Britten said:
This is some great behind-the-scenes info on the process of bringing a complex graphic to life. I’m with Joy – this is going into next fall’s infographics course.
Which movies does your neighbor watch on Netflix? « Cry in a Pail said:
[...] for News Design posted an article of The New York Times’s own Kevin Quealy talking about the process by which the Netflix interactive graphic was created. Like I suspected, they did not use the Netflix API but spoke with Netflix directly. They also used [...]
Kathleen said:
Really enjoyed reading this. Thanks!
The making of the NYT’s Netflix graphic | BlogHalt.com said:
[...] The making of the NYT’s Netflix graphic. A database dump from Netflix, some clever hackery in ArcView GIS, hpricot to scrape Metacritic and a lot of careful thought about the UI for navigating the data. [...]
tecosystems » links for 2010-01-26 said:
[...] The making of the NYT’s Netflix graphic – The Society for News Design just what it says. very interesting. (tags: newyorktimes netflix howto data visualization infographics makingof) [...]
Weekend Fodder | FlowingData said:
[...] The making of the NYT’s Netflix graphic – The interactive showing the Geography of Netlfix rentals was a big hit around the Web. Detailed, engaging, interesting, and a great ad for Netflix. [...]
dc321 said:
Does anyone know what programs were used other than ArcGIS, Illustrator and Ruby for scraping?
In particular, I’m curious where the web portal was done with Processing or Flash? How was the database managed?
Also – I should say, I’m not necessarily interested in how NYTimes did it, I’m more interested in getting a sense of how one could do something similar.
Thanks in advance,
Dana
The Making of the NYT’s Netflix Graphic « Revveal said:
[...] cool interactive visualization – check it out. SND (Society for News Design) posted this article explaining how it was done. Great behind the scenes explanation relating to data, design, and [...]
Pete said:
Cool. How about releasing the data for the other zips so I can use ArcMap to make a thematic map of my region, Detroit?
Bookmarks for February 9th from 11:30 to 13:52 — arghh.net said:
[...] The making of the NYT’s Netflix graphic – The Society for News Design – For the geeks – the making of the NYT interactive visualization of Netflix rentals [...]
Noel said:
I really liked this piece when I saw it on the Times site, great work!
One of the anomalies I noticed was that you outed the guy who gets his movies delivered to work at LaGuardia (11371) as a fan of “Romancing the Stone”, “Crocodile Dundee 2″, and “Godzilla’s Revenge”.
Ennuyer.net » Blog Archive » Rails Reading February 14 2010 said:
[...] The making of the NYT’s Netflix graphic – The Society for News Design [...]
Hyperakt » Play » Maps and Charts for Info Junkies said:
[...] Beautifully done. Easy to use. Information is focused enough to not overwhelm. Check out this “making of” piece. Pretty interesting [...]
Pamela said:
Fantastic detail. I would love to know the timeline for the graphic … from concept to published graphic – and how many graphic artists involved … Great work.
Kevin Quealy, NYTimes Graphics Editor « Information Design at Penn said:
[...] by Comberg I’m finalizing details for a visit by Kevin Quealy, designer of this great Netflix map. Information visualization in journalism is big business as news makes the turn from print to [...]
» The making of the NYT’s Netflix graphic said:
[...] … Muito bom o making of do The New York Times sobre como a equipe do jornal projeta seus infográficos na web. (via SND). Overview by Kevin [...]
Como o NYTimes produz infográficos « producaograficadesign said:
[...] Queues , sobre os 10 filmes mais alugados em 2009 naquele país, este artigo na página da SND mostra como foi produzido este infográfico. Interessante ver o passo-a-passo desta produção, [...]
Maps and Charts for Info Junkies : Hyperakt said:
[...] Beautifully done. Easy to use. Information is focused enough to not overwhelm. Check out this “making of” piece. Pretty interesting [...]
moonwalk rental houston, party rental houston, water slide rental in houston,moonwalk rentals houston area,moonwalks houston texas,moonwalks houston tx, moonwalks houston rentals, moonwalk in houston, party rentals houston,party rentals houston texas, moo said:
Hello, Neat post. There is an issue together with your web site in internet explorer, might check this? IE nonetheless is the marketplace leader and a large component of folks will pass over your magnificent writing because of this problem.