April 21, 2004

Gatekeepers of the Net: Search Engines and Regulatory Implications

"Gatekeepers of the Web" is, unoffcially, the "Berkman Center" panel here @ CFP: it includes Andrew McLaughlin, a long-time Berkman fellow, who is here representing Google, and Ben Edelman, a former fellow and affiliate @ Berkman, who'll discuss his filtering research.

Others on the panel: German researcher Dr. Marcel Macquill - who proposed the session, and Matthew Hindman of Harvard's Kennedy School.

Dr. Macquill begins with a description of a study he has conducted on search engines and attitudes toward regulating them:

Macquill: We did use surveys, explored market questions, did performance testing; also asked people about attitudes toward regulating search engines.

So: Over 90 percent of German people regularly use search engines. 69 percent of Germans used Google -- so we see predominance of one search engine.

What we found is cooperation behind the scenes among search engines. Google operates with other search engines.

Many people think search engines are neutral and objective. We try to show with this study that this is a myth. Results can be manipulated.

The term "spamming" is also be applied to manipulating websites to achieve better ranking. What are the methods? "Google bombing" -- building up hundreds/thousands of websites that contain one link. Can also use words inappropriately. Can use "invisible text": robot reads it, increases rank. [...]

Opinion poll: people use search engines for different reasons. Biggest reason: works better/easier to use. Majority of those who don't use Google use their search engine because...they've always used it.

Only 11 percent of the population uses a second search engine. Fewer use a third.

People know very little about search engines. Comparable to the situation in the '60s with TV -- a mysterious black box. We had to learn to be critical.

We asked about regulation: one third said no one should regulate.

We did a lab experiment: examined "partner" links -- paid content v. normal content. Google is transparent about this; others are not.

(Shows how search can go terribly wrong -- eBay hijacks your search, porn sites do typo-squatting, etc.; shows a number one result @ Google for NSDAP, gets an anti-Semite site. Wonders whether this should be changed.)

Main challenges: Google monopoly, gatekeeper function of search engines, paid links, etc. These are classic questions of media policy/concentration, etc. Classical questions in a new light.

Moderator: What Marcel is saying is that "neutrality" may lead to problems; Ben's presentation will explore the problems with "actvist" search engines.

Ben Edelman:

Three points -- country-specific omissions; attempts to take porn out of search engine, Google does best job, but error-filled; what's different about search engines v. other businesses.

Country-specific: "Stormfront" in U.S. Google, it's there; "stormfront" in German, it's gone. Most people don't know to compare. (Provides lists of omissions.)

Google not entirely clear about how it does this "filtering" -- can't seem to find out in an official way.

How serious is this problem? Not terribly, in comparison to the problems w/filtering porn.

Google doesn't claim perfection -- and there are indeed some problems. What is the definition of adult content? Google waffles, but seems to over-exclude (with "Safesearch"). Search for "Library of Congress" using Safesearch--it's missing! Why? Could be any number of reasons.

Also can't get "Northeastern University" if you have "Safesearch" on. Basically, this thing doesn't work.

This must be a hard problem to fix, or Google would have got it right already.

Thoughts on transparency: the "black box" (secret sauce) problem. To some extent, this is a business secret. I understand this. On the other hand there are good reasons to try to fix things.

[...missed a bit...]

Matt Hindman:

Because of link/traffic patterns, we may be facing a situation in which all search engines are returning the same results.

How did we get here? How did we enter the age of Google? Mr. Page had the bright idea that links contain a lot of intelligence. In all of this, paying attention to links was the critical shift.

Structure of the Web -- inbound and outbound links -- "power law." Traffic patterns are power-law distributed as well. Eternal myth of openness prevents people from recognizing power law. My work finds these power laws in politics, etc. What we have is a fractally organized Web that is dominated by the power law.

All roads lead to Rome; all search engines return same results.

I've been talking here almost exclusively about links. [...] What I would submit is that the problem is not Google -- it's the Web itself. Power law structures real problem.

Andrew M.: Obviously, this stuff matters. It matters what gets excluded. Two points: 1) steps for increased transparency (not fully baked, but we're working on it), 2) "Safesearch" stuff Ben raised -- there is an answer.

Transparency: One of the things that Google does is turn our C&Ds over to Chilling Effects. We put a note and people can see the take-down requests. This is a great model. We'll expand this to all legal orders. Google.de exclusions would be covered by this. Say we get an order from a court -- we'll submit the document to Chilling Effects. "Something used to be here." Before too long, we'll have something stable for Google to publish.

Another thing: pages are sometimes pulled because of things like malicious script. Talking to engineers. This may take longer. But the goal is transparency.

Okay, so back to "Safesearch" -- it's intended to be conservative. If we haven't crawled the page, we don't label it. So we don't move it into the "green zone." So Ben pointed out Thomas. We haven't crawled the page and don't have a cached copy -- we can't tell whether it's safe, so we leave it out. Also, some have asked not to be indexed; we don't index them. If the page says, 'Don't crawl this," we don't.

Our to-do is to describe this in an FAQ. No reason why not.

Ben: Andrew, I think the lag-time is a little silly.

Andrew: Okay.

Audience Q & A:


Q: It wouldn't take much engineering skill to put the "Something was here" notice at the top.

Andrew: It's a valid point.

Q: What's the problem w/convergence?

Matt: [...] We need to fight the myth of Internet openness. Eyeballs are actually more concentrated on the Web than in other media.

Q: Hasn't research shown that there is a relatively fluid power law?

Matt: No magic number you need to jump over. Important to bear in mind: well, everyone wants to be a pop star. There's "no real barrier" to becoming a star...but if you're a star on the Net, you're going to stay a star. In weblogs, for example -- top is very stable.

Macquill: What about my question: Should we do "human editing" of results, or not?

Andrew: Should we? Straw poll?

Audience mainly votes for "neutral" approach.

Andrew: Impossible to human-edit -- which to move up or down. Instead, let people know that they need to change their concept that the number one result is the most authoritative -- search engine no substitute for the human facility.

Posted by Donna Wentworth at April 21, 2004 02:27 PM
Post a comment

Remember personal info?