Random Recipe

The Cook Book

Recipe for the week of February 26 - March 3

Ultimate Bacon: The Giant Component of a Complex Network

Let's take a break from our usual fare to discuss an amusing contest that I entered last weekend. Apparently, sometime within the past few years, MTV talk show host Jon Stewart devised a parlor game in which contestants are to link any movie star to Kevin Bacon by a chain of films that share performers. Thus we imagine actors as vertices in a large network, with edges between any two who have been in a movie together. The goal is to find the path of minimal length connecting x to Kevin Bacon, that length then being the Bacon number, which we denote here as B(x). Quite a cult has grown up around this pastime, as witnessed by more than a dozen web pages now devoted to the game.

Now Brett Tjaden and Glenn Wasson at the University of Virginia have automated the calculation of B(x) at their Web site, The Oracle of Bacon. Their program, which makes use of the marvelous Internet Movie Database (IMDB), will compute the Bacon number of any performer you care to specify within a few seconds. Hard-core cultists seem threatened by the power of the Oracle, but the rest of us welcome this automation of what can be an arduous evaluation. For instance, B(Bara, Theda) = 3, and in fact the Oracle confirmed an outstanding conjecture that for any x from the United States, either B(x) is at most 4 or B(x) is infinite.

Mathematicians have had their own version of this story for many years, centered around the Hungarian number theorist and combinatorist Paul Erdos, where the links are formed by joint authorship of research papers. To have Erdos as center is especially fitting since he and his compatriot Alfred Renyi developed a beautiful theory of random graphs that captures the essential qualitative features of complex networks such as these. Thus Renyi has Erdos number 1. For a list of all mathematicians y with Erdos number E(y) at most 2, see The Erdos Number Project. Alas, E(Griffeath, David) = 3. A key feature of Erdos-Renyi theory, shared by the Movie and Mathematician databases, is that the sizable majority of members are connected to one another, all belonging to one Giant Component, while those members not in the Giant Component belong to very small, isolated clusters. It would be interesting to compute the average number of films per performer and the average number of performers per film for the IMDB, in order to compare the network to a best-fit Erdos-Renyi model. This would be a kind of mean-field approximation, a method commonly used in statistical physics to analyze complex systems. One could handle the database of mathematicians similarly, but somehow most folks are more interested in movie stars...

So last Saturday afternoon I discovered the Oracle of Bacon, and a contest announced there. The computer program had determined that of the 174,830 performers in the IMDB, 151,141 (88%) are in the Giant Component. For those, the maximal Bacon number is 8, and only 8 individuals have this value. A Kevin Bacon video of one's choice was offered to the first person who could correctly identify a member of {x: B(x) = 8}. With my life-long passion for movies, I couldn't resist spending many hours probing the dark recesses of film history until, at about 10 AM on Sunday, I found an incredibly obscure 1928 Soviet pirate film, Plenniki Morya, starring P. Savin with a Bacon number of 7, and whose supporting cast of 8 appeared nowhere else. Evidently I had discovered ALL the solutions: A. Kramer, N. Kutuzov, J. Laptev, V. Podgorny, A. Safroshin, E. Smirnova, I. Strauch, and Dmitri Vasilyev. As evidenced by The Bacon Challenge, these folks eventually lead to Roman Polanski (2), Jack Nicholson (1), and Kevin Bacon (0). That page also documents the lamentable fact that I came in second to one Stefan Kahrs of the University of Edinburgh Laboratory for Foundations of Computer Science. I have since learned from Stefan that he has been instrumental in developing the IMDB over the past two years, introducing the first links between most of the Soviet Cinema and Hollywood. I feel less disappointed with my 'honorable mention' knowing that the winner did the original research on the longest path!

The story needn't end here, however. There are several intriguing open questions relating to the INDM network, for instance:

(a) What is the diameter of the Giant Component?

(b) What are the size and contents of the second largest component?

(c) Who is the most central performer (the REAL Kevin Bacon)?

A natural way to formulate (c) is in terms of minimal average distance to all members of the Giant Component. Last April, Claus Schotten determined that several other performers including Ralph Bellamy, James Mason and Orson Welles had smaller averages than Kevin, but he did not carry out an exhaustive analysis. Kahrs conjectures that a 'character actor' such as Michael Caine, Klaus Kinski, Donald Pleasence, or most likely John Carradine should be the true Center. Of course the database is quite dynamic since new films are made, and obscure old films are added, all the time. As is our custom, we'll report any late-breaking developments in a future recipe.

Take me higher...
Introduction to the PSK PSK Search Recent Additions CA Archive CA Links Feedback Appreciated !