just a personal cut & paste page

lunedì, maggio 12, 2003

How to Find Anything Online
fonte: PCMAG
May 27, 2003
By Sean Carroll


Searching through the petabytes of fact, fiction, and rumor that make up the World Wide Web is no mean task. It's like wandering through a library without a filing system or a card catalog. Portals like Yahoo! evolved as manageable entry points to this ever-growing repository. But even the best portals aren't doors into a virtual version of a library. They're doors out, from the safety of your information-barren home or office into a wilderness of pages, files, databases, and sites?too much ground for anyone to cover. That's where search engines come in.

While each search engine is different, they are more closely related than ever, as shown in the chart "Search Web." Classic search engines spider, or crawl, the Web, indexing and categorizing the data on each page they have access to or the metadata that describes it. Frequent crawling is important, given that the Web is growing fast. And the crawlers are getting smarter. Many can find and index at least some of the PDFs and other content types that make up much of the "Invisible Web," unseen by spiders' (and search engine users') eyes.


Given the vast amounts of information on any given subject, simply indexing is not enough. A search engine has to weight the pages so that the most commonly useful links come up first. There are several ways to do this, but the best-known is based on the popularity of each site, as represented by the number of other sites that link to it. This is a simple way of describing the techniques pioneered by Google and adopted in one form or another by many competitors.

While some detractors point out that this strategy may ultimately drive popularity instead of following it, Google's success is undeniable: It's a household name. Who among us hasn't Googled?

Yet even Google has a long way to go. The latest challenge is the Deep Web, which represents data that can't be crawled, not because it's in pages that the spiders can't recognize, like PDFs, but because it doesn't exist in static page form (except as answers to database queries), or because it's hidden behind authentication screens. And this information is often the cream of the crop: magazines, books, peer-reviewed journals. To get to it, you need to research the database content yourself and then pay a hefty fee or, ironically, you need to go to a modern library (they're not so quaint and irrelevant, after all) that has access and, ideally, a reference librarian to help you get started.

In the pages that follow, we offer reviews of five popular general-purpose search engines. We also provide tips to help you get the most out of any search engine, a list of sites giving you entry into the Deep Web (see the sidebar "Niche Search"), and a kit of software and Web tools that will support you in your search for information.



Editors' Choice: Google
May 27, 2003



Reviewers may overuse the word uncanny to describe Google's talent of finding what you're looking for. In testing, we found Google a consistently fine performer?unsurprising given that it provides so much content to so many other engines. We were surprised, however, that it no longer dramatically outperformed its competitors. Still, it stands ahead of the pack in search-enhancing features and for the most part returned excellent search results. Click here to read the full review.






AlltheWeb

May 27, 2003



AlltheWeb's claim to fame is a large index, but bigger isn't always better, and a fast search can be a hasty search. Several of our attempts returned irrelevant sites with the keywords in their metadata. On the other hand, AlltheWeb's many advanced options and ability to refine search results are helpful for serious users.

The site is nicely customizable, with sophisticated search options such as searches for pages linked to a given page, searches within URLs or page titles, and searches limited by page size. There's a check box for exact-phrase searches, and one personalization option is to make certain advanced search options accessible by drop-down boxes in the main interface.


The advanced features don't always follow conventions. For example, you use parentheses to indicate the Boolean OR function on the main page (OR behaves normally on the advanced search page). Fortunately, the Help page is clear and extensive.

AlltheWeb also analyzes your search phrase, restructuring it with quotation marks and eliminating words like the for better results. You can disable this feature (via the handy Customize button), but we found it improved results considerably in natural-language queries.

AlltheWeb has specialized search tabs (a feature popping up everywhere) for News, Pictures, Video, Audio, and FTP. You can customize news searches by checking boxes marked international, U.S., local, business, and so on.

When AlltheWeb finds multimedia items in a Web search, it displays links at the bottom of your results page?though we never came across an audio link, even when searches under the Audio tab turned up plenty (for example, Michael Jackson).

AlltheWeb lets you add a search button to Internet Explorer, a sidebar to Netscape, or an AlltheWeb Hotlist panel to Opera (a browser that Google's toolbar can't accommodate).

Still, AlltheWeb's results aren't extraordinary. In our experience it was dismal at finding home pages. Natural-language queries were also a problem. Unusual words helped: A natural-language query asking who played Fegan Floop in the movie Spy Kids worked perfectly. In general, we got the best results with specific, multiword queries: Sacramento River Cats and North Korea -"nuclear weapons," for example, gave us perfect results.



AOL

May 27, 2003
By Cade Metz


If you're one of the 35 million people who access the Internet through America Online, you might want to use AOL search, because it's so conveniently incorporated into the client (we tested the integrated version within the client). AOL search tends to push you toward AOL content and AOL partner sites, however, and when searching the Web at large, it's not as powerful as some of the other engines reviewed here. While AOL search is also available at AOL.com, if you don't have an account with the online service, we recommend trying another Web-based search engine.

When you launch AOL, the AOL search window pops up conveniently on the left side of your desktop and helps you search not only the Web but also AOL itself. Query results fall into four groups: AOL content, sites recommended by AOL's editors, sponsored sites, and sites culled from a general Web search. Although the two latter groups, which are best, are provided by Google, their results are often pushed far down the page.


Because the sponsorships are intrusive and often only loosely relevant, they reduce the effectiveness of searches. When we entered McDonald's +wireless, looking for information about the Wi-Fi hotspots being installed by the fast-food chain, the first three results were sponsored sites selling cell phones and wireless equipment. But this search also yielded three Google-provided sites with exactly what we wanted, on the first page of results.

Even these results aren't always as impressive as those you'd get from Google. AOL search doesn't let you use any of Google's advanced operators or anything like the tools available on Google's advanced search page. The only way to focus your search is to add simple operators, such as double quotes (to search on an exact phrase) or a minus symbol (to exclude a word or phrase).

The site provides two indexes you can browse?one covering just AOL, the other covering both AOL and the Web. Open Directory results are woven throughout. AOL search is far more effective than it was just a few years ago, but there will be times you need the power of a service devoted entirely to search.



Google

May 27, 2003
By Sarah Pike


Reviewers may overuse the word uncanny to describe Google's talent of finding what you're looking for. In testing, we found Google a consistently fine performer?unsurprising given that it provides so much content to so many other engines. We were surprised, however, that it no longer dramatically outperformed its competitors. Still, it stands ahead of the pack in search-enhancing features and for the most part returned excellent search results.

Customization options include language, content filtering, and number of results per page, but you can't add advanced search drop-downs or check boxes to the main interface. The advanced search page compensates with excellent, extensive options, including Boolean searches and searches by file format, date, and domain. You'll also find a few search types not listed among the specialized searches on the main page, such as a similar-page search and searches for terms found in the title, URL, or text for a page, or in other pages' links to a page.


The main specialized searches are Images, Groups (a huge repository of archived Usenet discussion forums), Directory (content organized by topic, à la Yahoo!), and News.

Google improves your search results by checking for typos and offering spelling help. It also ignores common words like what, of, and is. You can force it to acknowledge a word by prefixing the word with a plus sign. The Help page, which is almost too extensive to be quickly helpful, has tips for making searches more effective, including searching page titles or URLs, looking for related sites, even finding only cached pages.

In fact, nearly all of Google's search results include the option to view cached versions of the pages returned?useful when a link is broken or a site is no longer available. Google also provides translation for pages in some foreign languages (though not Slovene, as we found in our search on Republic of Slovenia).

In our searches, Google's performance was fairly even across the board. It did well at finding specific home pages and was excellent with complex, multiword queries. What makes its searches stand out is that the Web page results?the ones people care about?are pushed to the top, with very few exceptions. Sponsored links are either compact tint boxes at the top of the results page or pushed to the side where they can be ignored?although their relevance is often quite high.



MSN

May 27, 2003
By Cade Metz


MSN is slowly gaining on our Editors' Choice: It can restrict searches to particular domains, file types, regions, and languages. It also automatically corrects spelling. And it's adept at natural-language queries. But MSN's results aren't always fresh, and you can't refine searches with specialized operators from the main search page. Nor does MSN offer cached pages or translations.

MSN's results come in five flavors: Popular Topics (common searches by MSN users), Featured Sites (recommended by MSN editors), Sponsored Sites, Web Directory Sites (from a Yahoo!-like Web index), and Web Pages (culled from the Web at large). Unlike AOL, MSN displays only relevant sponsored sites. When we tried McDonald's +wireless, looking for the chain's Wi-Fi hot spots, MSN didn't push us to sites selling wireless hardware.


Unfortunately, the categories are always displayed in the aforementioned order, with general Web sites buried under other results. We also object to MSN's overly subtle delineation of sponsored links. They're marked, but not clearly enough that you won't end up scanning the sponsored links to get to the others?a problem Google addresses more gracefully.

On natural-language queries, MSN actually performed better than Google-based engines. When we asked Where can I buy duct tape online?, MSN took us straight to a pair of sites that sold duct tape.

Still, MSN's searches aren't as up to date as Google's. Since we did our testing just after the death of everyone's neighbor, Fred Rogers, we expected to find an obituary when we typed "Mr. Rogers" +dead, but instead the first two results were a site about the death of Roy Rogers and one about Mr. T.

With most of our searches, at least one of the top ten results was a dead link. Even the live ones were occasionally questionable. When we tried "bed and breakfast" +"New England," MSN's summary for one result read, "Bernice Chesler's Bed & Breakfast in New England Web Site is Now Closed."

When keying words into the search box on MSN's home page, you can use double quotes to search for a specific phrase. But for other operators, such as minus signs (to exclude pages containing given words) or asterisks (for wildcard characters at the end of words), you have to visit the advanced search page. You'll often come here anyway to restrict searches by domain, file type, region, and language?but why not enable operators on the main page?



Yahoo!
May 27, 2003
By Cade Metz


Since we last reviewed the Yahoo! search engine, it has improved significantly. Yahoo! now provides many tools for defining the scope of your queries. Search results from the Web at large, supplied by Google, are now grouped on the same page as those from the Yahoo! Directory, the site's long-standing Internet index (in the past, you had to visit a separate page for Web results). If you enjoy browsing the Yahoo! Directory?displayed above the general Web results and often featuring more relevant results?as well as surfing through uncataloged sites, you might make this your primary search engine.

On the other hand, if you object to being pushed toward sponsor sites (as we do), you might think again. When displaying search results, Yahoo! doesn't start with sponsored sites?it begins with matching items from inside Yahoo! itself and relevant categories in the Yahoo! Directory?but it places sponsored sites above the Web results (from Google).



If you're looking for products and services, sponsor matches can be helpful. When we searched for George Foreman +grill, the results page began with four sponsor sites selling them. But if you're simply looking for information, sponsors can get in the way. When we searched on the term 1394, looking for the home page of the 1394 Trade Organization, it did point us there?after listing three sites selling computers.

When we last reviewed Yahoo! Search ("In Search of...," December 5, 2000), we complained about the lack of tools for refining searches. Thanks to a new advanced-search page, you can now restrict your search to particular domains, languages, and regions. You can search for sites based on when they were last updated. And you can look for pages that use particular keywords in their URLs.

Yahoo! still has distance to make up, however. It lacks some search abilities we enjoyed in other engines, such as wildcards and the ability to perform Boolean searches directly from the basic search box. But as we went to press, Yahoo! announced that an updated version of its search engine was soon to be released.



Scorecard: Search Engines
May 27, 2003



All of these sites provide useful results, so we've rated them on ease of use, features, and flexibility. A high interface rating indicates a lack of unrelated ads and other extraneous distractions, plus easy-to-skim results and a helpful help system. A site with good search flexibility lets you specify what you want to find using such features as complex Boolean search, wildcards, and proximity search.

Some sites let you specify where you want to search: These targeted search options include limiting searches to specific domains or sites, searching within URLs or titles, and limiting results to specific file types or languages. We've also rated the sites on results options?like refining search results with new keywords, displaying cached versions of pages, and locating pages similar to a particular hit.

Click here to view the Scorecard results.




Search Better
May 27, 2003



Have a Backup
Although the world of search is increasingly interconnected (see "Search Web"), that doesn't mean all engines give the same results. Two engines that draw on the same data may respond differently because they don't use the same methods to weight the data. When you can't get to information you suspect is available, try a different engine. It takes only a minute, and you might be surprised at what you uncover. At the least, you'll keep abreast of changes at other search sites.


Get to know your engine
Every engine has a bar in which you enter your searches. And just about every engine has many features and individual quirks. Once you've picked an engine, take the time to read the help pages and examine its advanced search features, its search refinement capabilities, and any other power user features. If you search with any regularity, you'll get a quick return on the time investment.

Even if, after reading our reviews, you decide to stick with a tried-and-true favorite, go back and check out these capabilities, if you haven't already. In any case, watch for new developments. The better engines are always making improvements.

Learn the lingo
Just knowing Boolean query language can help you focus more clearly on what you're looking for. Some sites let you use Boolean operators on their main search bars; others may require you to click through to their advanced options.

You can always improve your search results by using combinations of the standard Boolean operators AND (both terms must be present), OR (either or both terms must be present), and NOT (the following term must not be present).

Besides using the above terms, you can often use parentheses and quotation marks to group items, as in mathematical equations. For example, if you're looking for David Copperfield (the book by Charles Dickens) but can't remember the author's name, you might type in "David Copperfield" AND book NOT magic.

Note that while AND, OR, and NOT are the classic Boolean terms, many search engines have their own spin on the terms. You may see ANDNOT instead of NOT, for example. Some actually have Boolean forms you can fill in, where you enter your terms in separate boxes connected by drop-down menus whose choices are Boolean operators. Check on each site's help files for particulars.

Advanced Searchers, Advanced Searches
Most search engines, conventional and special, offer alternative Web-searching methods. A listing of directories and a search bar are standard features on opening screens. Look for advanced search option links to define, limit, or expand your search terms. Many search engines will even walk you through the process in their help files.

You can use Boolean terms (with engines that support them) to do the minimal work yourself, or you can take advantage of special search forms.

Forms use Boolean terms translated into English; they may also let you enable filters such as stemming or truncating (like searching on color* to get color, colors, and colored).

You can also refine your results or present them in a certain format. For example, AlltheWeb's advanced customized preferences let you use query rewrites to improve results by rephrasing your query, as well as auto-complete (to suggest ways to complete partial search phrases), and news integration. Google's advanced search functions are easy to master and allow word, domain/site, language, and parental-control filtering.

Spell it Right
Search engines, in the main, reflect the "garbage in, garbage out" principle. They look for the exact spelling of the term you enter, so be sure to double-check if you get strange results.

Spell it Wrong
The web is full of misspellings. in some cases, it might not be a bad idea to try misspelling your terms. It depends on the kind of search you're doing, however. Searching for recombinant DAN might not provide the kind of results you want for an academic research paper on genetics, but searching for Avril Lavinge pix might help you satisfy your punk/pop/pictorial urges.

Be Exclusive
Exclude words. If you're overwhelmed by irrelevant results, take a minute to look at a few of the misses and look for a common keyword you can pull out to restrict the results. Many search engines let you exclude certain results?some through an advanced interface or Boolean language, others through means as simple as putting a minus sign in front of each keyword you want to exclude. There are several ways to handle this, so you'll have to do a little checking.

Try Different Forms
If your search on theoretical physics isn't turning up all the right stuff, try theory physics. For electrical, try also electric, electricity.

Cast Your Net Wide
If you're having trouble finding good results with your favorite engine, why not try a metasearch tool that looks at a bunch of results from a variety of engines?

Frankly, metasearch isn't our favorite way to go. Metas haven't come the long way that regular search engines have, and since they aggregate results from many search sites, they often present too many results in an unhelpful interface. More important, they typically can't translate complex query language into the specific format each site requires.

Although by and large we see individual search sites improving, no site is yet perfect, and no site comes close to indexing the whole Internet. So if your first- and second-string search sites can't come up with the goods, check out a metasearch site like Dogpile, IxQuick, MetaCrawler, or Mamma.com, or a tool like Copernic (see the review). HotBot, not generally perceived as a metasearch site, has a cool new interface that offers a sort of lateral metasearch. Without retyping your query, you can click radio buttons to get the results of your search in AlltheWeb, Google, Inktomi, or Teoma.

¿Donde Esta?
If your search is language-specific, or you need a translation, or you want to search locally, a variety of tools stand ready to perform the task.

Many search engines, such as Google, let you set your preferences to search in the language of your choice. There are country-specific search sites and filters for narrowing your search by language and country.

Roots

Looking for pages about your culture and language? There's a wealth of information out there. One of our favorite resources for this kind of search is www2.wheatoncollege.edu/ Wallace/Instruction/Workshops/ forlang.rev01/forlang.rev.html#gate. Here we found Search Engine Colossus (www.searchenginecolossus.com), which has a directory of 195 countries and 38 territories as well as your standard search bar. A search on Switzerland retrieved a few dozen links to search engines, including a Japanese-language site about the country.

Educators can find resources such as www.iecc.org to get in touch with students around the world.

Dive Deeper and See the Unseen
Many people think that their search engine can find any piece of electronic information. This isn't true.

There's a whole category of pages that lack links pointing to them or that aren't indexed because they consist of data types such as pictures, music, and PDF files, which spiders and robots won't index (although many conventional search engines have made some progress in this regard). The vast majority of the Deep Web, as it is known, consists of databases that spiders can't index. A spider can read the address of a database, but it can't decipher the contents, because the pages with valuable information are created only as temporary responses to queries from database users. Additionally, many databases require a user ID and password, which further complicates the work of spiders and bots.

You can find and access many of these databases through portals that specialize in providing access to the Deep Web. These include CompletePlanet (www.completeplanet.com) and InfoMine (http://infomine.ucr.edu).

Check out our list of specialized, or niche, search engines to get a taste of what is available, then follow these links to high-quality information.

Go Back to the Library
Library reference departments are still great sources of information, even in today's online environment. Many of the indexes, directories, and encyclopedias you used in print have been reformatted and are available online. Perhaps even more important, the magazines and peer-reviewed journals you no doubt used for academic research are also often available online. Many academic and large public libraries have been gradually converting collections from print to online access. In fact, the future of the printed versions of many of these publications is increasingly in doubt.

Just as libraries have paid for books and periodicals, they pay a price for the corresponding online versions. For this reason, access to these materials is restricted, and you won't be able to use them without permission. Most academic and public libraries provide remote access to their communities via IP address or user name and password. Usually users have to configure their browsers and provide their library ID numbers to access the databases from home. Some libraries let unaffiliated walk-in users access their online resources; others (especially academic libraries) restrict entry or usage to registered patrons. It's best to check before making the trip.

Most libraries have Web pages describing available online resources, giving you an idea of the subject areas their online collections cover. Once you get to a library, ask a reference librarian to help you get started, since each library's collections and policies are unique.

Investigation is likely to turn up resources covering news, IT, biography, and much more. Check our listings for JSTOR, LexisNexis Academic, Mergent Online, and Safari Tech Books Online in our review of niche search engines for examples of the databases you're likely to need to visit a library to access.

Remember that while you may find large quantities of information though regular search engines, there's no guarantee that it's quality information. Anyone can put up a Web page. Library databases are created by well-known publishers and are evaluated carefully by librarians before they are purchased.

Keep An Eye on the Future of Internet Searching
An Internet search engine performs its task in a purely mechanical way. It searches for text in a document, takes into account how often words are repeated, and then applies rules for ranking. A good search engine has an easy-to-use interface, documents a huge number of pages, and updates links frequently.

What a search engine can't do is understand your query on a more human level. It can't make associations between words and topics the way we can, so it loses precision in retrieval. That's why we've gotten used to doing vocabulary acrobatics, then settling for a few useful links swimming in a sea of thousands or even millions of irrelevant hits.

What if you could ask a search engine to retrieve documents on a topic and it would be smart enough to find valid pages without all the noise? A new search technology called Latent Semantic Indexing (LSI) may be the answer.

LSI employs a mathematical algorithm to calculate word associations. The resulting technology is similar to the information retrieval you might get at your local library through a card catalog or OPAC (online public access catalog) but without the structured data constraints. Adding LSI to current technologies brings you to a higher level of understanding. It converges disparate database indexes holistically.

Like conventional search engines, LSI looks at each document's content words as opposed to commonly used words like and and the. Documents with many of the same words are semantically close, and documents with few of the same words are semantically distant. LSI then maps the documents' "location" in a multidimensional space, with one dimension for each index word, while grouping semantically close documents near one another. Mathematical transformations make this space accessible, and the result is a search that finds related pages even if they don't share the same keywords.

If you want to read more and keep abreast of this new development, follow these links:

? http://javelina.cet.middlebury.edu/lsa/out/cover_page.htm

? http://lsi.research.telcordia.com/

? www.psych.nmsu.edu/~pfoltz/cois/filtering-cois.html

No One Searches Alone
While we try not to encourage paranoia, everyone should be aware that searches, like any other Internet or computer activity, can be traced. A determined tracker can reconstruct your Web searches from a variety of sources. Some, such as your history cache, can be easily cleared (though defeating determined searches may require more than simply clearing your cache). Other traces?such as those in your company's Web server logs (if you're searching from work), your ISP's logs, and the logs at the search engine company itself?are much more difficult to erase.

Libraries are facing an updated version of the old problem of what to do about law enforcement inquiries regarding patrons' borrowing habits?for instance, when law enforcement requests information about a patron's computer-assisted research. In some cases, servers have been seized from libraries by law enforcement officials. This isn't a problem for walk-in users who aren't required to provide any kind of identification, but if you use a password to log in remotely, you may want to consider what kind of personal information you're leaving behind.



What language?
May 27, 2003



Translation tools are more hit-or-miss than tools for finding content in other languages. Google's "translate this page" feature gives you an idea of what a page is about, but the results can make for challenging reading. AltaVista's venerable Babel Fish Translation (http://world.altavista.com) will translate a block of text, a Web page, or e-mails. The translations seem to be word-by-word, so we wouldn't take them as gospel, but they can certainly help you if that piece of information you need is available only in another language.




How Good Is Your Information?
May 27, 2003



Some surprising tools will reveal a lot about the information you've found, which in turn may help you decide its value.

To find out who owns the site your search engine retrieved for you, turn to a WhoIs site such as www.whois.sc. Or perhaps use a better-known example, like www.whois.org. Revealing the source is an invaluable trick, especially with matters such as health or financial information. Also consider fee-for-service searchbots such as BrightPlanet's DQM2 Deep Query Manager, which helps you find, classify, and manage information.




Don't Be Stingy
May 27, 2003



Nearly one-third of all searches are single-word searches. While search engines have gotten smart enough to nail many one-word searches, they tend to put the most popular choices first. For obscure subjects this might be a reasonable strategy, but few searches are as obscure as you might suspect.

In our testing, one of the one-word searches we tried on every engine was Godiva, looking for the chocolates. But what if you were trying to find, say, the original Anglo-Saxon name of Lady Godiva? Typing Godiva into Google yields results mainly for chocolates, but taking the time to type in Lady Godiva brings up a page with the answer (Godgifu).


Depending on the word in question, you might get anywhere from hundreds to hundreds of thousands of answers (or more) from a single-word query. While adding more keywords won't necessarily cut the results down to a number you can easily scan through, it's likely to push more relevant results to the top.



Niche Search
May 27, 2003


Asiaco http://search.asiaco.com
A searchable index of Asia-related topics on the Internet.

AskERIC www.askeric.org
The Educational Resources Information Center (ERIC), a national information system funded by the U.S. Department of Education.

Ask Jeeves Kids www.ajkids.com
Every kid should have his own butler to help search the (carefully screened) Internet.


Biography Resource Center www.galegroup.com/BiographyRC
Search for people by personal facts such as birth and death year, nationality, ethnicity, occupation, or gender.

CiteSeer http://citeseer.nj.nec.com/cs
Has anyone cited that obscure paper you wrote on nanotechnology? Find out here. Search not only indexed scientific documents but also the citations they contain.

Cool4Kids www.cool4kids.com
This kids-only search engine draws on the Kids and Teens Open Directory Project. With 17,549 links and counting.

eBizSearch http://gunther.smeal.psu.edu/index.html
Search the Web as well as academic and commercial articles for various aspects of e-business.

eLibrary http://ask.elibrary.com
Searchable archive of books, articles, newspapers, transcripts, pictures, and maps. After the seven-day trial, you still get abstracts free, but full files will cost you.

Philosophy Research Base www.erraticimpact.com
There's a little bit of everything philosophical here, but the heart is a search engine that finds books on philosophical topics in partnership with Amazon.com.

FindArticles www.findarticles.com
A useful free search engine indexing published articles from more than 300 sources (PC Magazine apparently not among them).

GPO Access www.access.gpo.gov/su_docs/multidb.html
All about the government. Access to multiple databases for statistics, publications, history, and more.

HighWire http://highwire.stanford.edu
Search over 12 million fully indexed articles in over 4,500 Medline journals; the abstracts, at least, are often free. About 360 of the site's journals offer free content (often in back issues) as well.

Hoover's Online www.hoovers.com
This paid subscription service makes more than 6,000 business publications available through Factiva, a Dow Jones & Reuters company. You can search by keyword, company name, and symbol, among other options.

IncyWincy www.incywincy.com
The Incy Wincy Spider crawls through the Invisible Web (as found on the Open Directory Project).

JSTOR www.jstor.org
Academic researchers, salivate. This archival collection contains full-image, full-run academic journals on everything from Botany to Business. You'll need to log on through an academic institution.

LexisNexis Academic http://web.lexis-nexis.com/universe
A for-pay, full-text database of news, business, legal, and government information.

Mergent Online www.mergentonline.com
An integration of databases, such as Moody's Industrial Manual, EDGAR (Electronic Data Gathering, Analysis, and Retrieval) filings, and Company Data Direct (U.S. and international).

NatureServe Explorer www.natureserve.org/explorer
What is Mustela nigripes, and will it be around much longer? Check here for searchable plant, animal, and ecological-community information for the U.S. and Canada.

The On-Line Encyclopedia of Integer Sequences www.research.att.com/~njas/sequences
This is a weird one. Enter a series of numbers and this site will tell you the rationale behind it. We tried "8 5 4 9 1 7 6 3 2 0"?the digits in alphabetical order. It got the right answer.

PublicLibraries.com www.publiclibraries.com
Find your public, state, university, presidential, or national library online.

PubMed www.ncbi.nlm.nih.gov /entrez/query.fcgi
This service of the National Library of Medicine provides access to over 12 million Medline citations, going back to the mid-1960s.

O'Reilly Network Safari Bookshelf http://safari.oreilly.com
Electronic versions of hundreds of technical books from the venerable IT publisher, covering more than 20 categories, from Business Reference to XML.

S&P Netadvantage www.netadvantage.com
A great place to seek out information on investment trends, markets, and companies in play. But you'll need a log-on just to access the site.

SearcheBooks.com www.searchebooks.com
A full-text index of e-books. Enter "For this relief much thanks" and you're instantly given a link citing Hamlet, Act I scene 1.

SearchEdu.com www.searchedu.com
Search the .edu domain.

Search Engine Colossus www.searchenginecolossus.com
Search for search engines worldwide in the language of your choice.

SearchGov.com www.searchgov.com
Search the .gov domain.

SearchMil.com www.searchmil.com
Search the .mil domain.

SpeechBot http://speechbot .research.compaq.com
An HP Invent search-bot site that indexes 15,590 hours of broadcasts over the Web, both audio and transcripts.

10K Wizard www.tenkwizard.com
Search the SEC's EDGAR for real-time SEC filings.

VolunteerMatch www.volunteermatch.org
"Get out. Do good." The logo says it all. You can search by area, interest, and schedule.

World News Connection http://wnc.fedworld.gov
A for-pay foreign-news aggregation service maintained by the U.S. Department of Commerce. Want the latest on Ugandan cattle rustling? This is the place.

Yahooligans! www.yahooligans.com
One of the best kids-oriented search sites and a swell portal for youngsters as well.



Toolbox

May 27, 2003
By Cade Metz

domenica, maggio 11, 2003

hackers and painters

May 2003

(This essay is derived from a guest lecture at Harvard, which incorporated an earlier talk at Northeastern.)

When I finished grad school in computer science I went to art school to study painting. A lot of people seemed surprised that someone interested in computers would also be interested in painting. They seemed to think that hacking and painting were very different kinds of work-- that hacking was cold, precise, and methodical, and that painting was the frenzied expression of some primal urge.

Both of these images are wrong. Hacking and painting have a lot in common. In fact, of all the different types of people I've known, hackers and painters are among the most alike.

What hackers and painters have in common is that they're both makers. Along with composers, architects, and writers, what hackers and painters are trying to do is make good things. They're not doing research per se, though if in the course of trying to make good things they discover some new technique, so much the better.



I've never liked the term "computer science." The main reason I don't like it is that there's no such thing. Computer science is a grab bag of tenuously related areas thrown together by an accident of history, like Yugoslavia. At one end you have people who are really mathematicians, but call what they're doing computer science so they can get DARPA grants. In the middle you have people working on something like the natural history of computers-- studying the behavior of algorithms for routing data through networks, for example. And then at the other extreme you have the hackers, who are trying to write interesting software, and for whom computers are just a medium of expression, as concrete is for architects or paint for painters. It's as if mathematicians, physicists, and architects all had to be in the same department.

Sometimes what the hackers do is called "software engineering," but this term is just as misleading. Good software designers are no more engineers than architects are. The border between architecture and engineering is not sharply defined, but it's there. It falls between what and how: architects decide what to do, and engineers figure out how to do it.

What and how should not be kept too separate. You're asking for trouble if you try to decide what to do without understanding how to do it. But hacking can certainly be more than just deciding how to implement some spec. At its best, it's creating the spec-- though it turns out the best way to do that is to implement it.



Perhaps one day "computer science" will, like Yugoslavia, get broken up into its component parts. That might be a good thing. Especially if it meant independence for my native land, hacking.

Bundling all these different types of work together in one department may be convenient administratively, but it's confusing intellectually. That's the other reason I don't like the name "computer science." Arguably the people in the middle are doing something like an experimental science. But the people at either end, the hackers and the mathematicians, are not actually doing science.

The mathematicians don't seem bothered by this. They happily set to work proving theorems like the other mathematicians over in the math department, and probably soon stop noticing that the building they work in says ``computer science'' on the outside. But for the hackers this label is a problem. If what they're doing is called science, it makes them feel they ought to be acting scientific. So instead of doing what they really want to do, which is to design beautiful software, hackers in universities and research labs feel they ought to be writing research papers.

In the best case, the papers are just a formality. Hackers write cool software, and then write a paper about it, and the paper becomes a proxy for the achievement represented by the software. But often this mismatch causes problems. It's easy to drift away from building beautiful things toward building ugly things that make more suitable subjects for research papers.

Unfortunately, beautiful things don't always make the best subjects for papers. Number one, research must be original-- and as anyone who has written a PhD dissertation knows, the way to be sure that you're exploring virgin territory is to to stake out a piece of ground that no one wants. Number two, research must be substantial-- and awkward systems yield meatier papers, because you can write about the obstacles you have to overcome in order to get things done. Nothing yields meaty problems like starting with the wrong assumptions. Most of AI is an example of this rule; if you assume that knowledge can be represented as a list of predicate logic expressions whose arguments represent abstract concepts, you'll have a lot of papers to write about how to make this work. As Ricky Ricardo used to say, "Lucy, you got a lot of explaining to do."

The way to create something beautiful is often to make subtle tweaks to something that already exists, or to combine existing ideas in a slightly new way. This kind of work is hard to convey in a research paper.



So why do universities and research labs continue to judge hackers by publications? For the same reason that "scholastic aptitude" gets measured by simple-minded standardized tests, or the productivity of programmers gets measured in lines of code. These tests are easy to apply, and there is nothing so tempting as an easy test that kind of works.

Measuring what hackers are actually trying to do, designing beautiful software, would be much more difficult. You need a good sense of design to judge good design. And there is no correlation, except possibly a negative one, between people's ability to recognize good design and their confidence that they can.

The only external test is time. Over time, beautiful things tend to thrive, and ugly things tend to get discarded. Unfortunately, the amounts of time involved can be longer than human lifetimes. Samuel Johnson said it took a hundred years for a writer's reputation to converge. You have to wait for the writer's influential friends to die, and then for all their followers to die.

I think hackers just have to resign themselves to having a large random component in their reputations. In this they are no different from other makers. In fact, they're lucky by comparison. The influence of fashion is not nearly so great in hacking as it is in painting.



There are worse things than having people misunderstand your work. A worse danger is that you will yourself misunderstand your work. Related fields are where you go looking for ideas. If you find yourself in the computer science department, there is a natural temptation to believe, for example, that hacking is the applied version of what theoretical computer science is the theory of. All the time I was in graduate school I had an uncomfortable feeling in the back of my mind that I ought to know more theory, and that it was very remiss of me to have forgotten all that stuff within three weeks of the final exam.

Now I realize I was mistaken. Hackers need to understand the theory of computation about as much as painters need to understand paint chemistry. You need to know how to calculate time and space complexity and about Turing completeness. You might also want to remember at least the concept of a state machine, in case you have to write a parser or a regular expression library. Painters in fact have to remember a good deal more about paint chemistry than that.

I've found that the best sources of ideas are not the other fields that have the word "computer" in their names, but the other fields inhabited by makers. Painting has been a much richer source of ideas than the theory of computation.

For example, I was taught in college that one ought to figure out a program completely on paper before even going near a computer. I found that I did not program this way. I found that I liked to program sitting in front of a computer, not a piece of paper. Worse still, instead of patiently writing out a complete program and assuring myself it was correct, I tended to just spew out code that was hopelessly broken, and gradually beat it into shape. Debugging, I was taught, was a kind of final pass where you caught typos and oversights. The way I worked, it seemed like programming consisted of debugging.

For a long time I felt bad about this, just as I once felt bad that I didn't hold my pencil the way they taught me to in elementary school. If I had only looked over at the other makers, the painters or the architects, I would have realized that there was a name for what I was doing: sketching. As far as I can tell, the way they taught me to program in college was all wrong. You should figure out programs as you're writing them, just as writers and painters and architects do.

Realizing this has real implications for software design. It means that a programming language should, above all, be malleable. A programming language is for thinking of programs, not for expressing programs you've already thought of. It should be a pencil, not a pen. Static typing would be a fine idea if people actually did write programs the way they taught me to in college. But that's not how any of the hackers I know write programs. We need a language that lets us scribble and smudge and smear, not a language where you have to sit with a teacup of types balanced on your knee and make polite conversation with a strict old aunt of a compiler.



While we're on the subject of static typing, identifying with the makers will save us from another problem that afflicts the sciences: math envy. Everyone in the sciences secretly believes that mathematicians are smarter than they are. I think mathematicians also believe this. At any rate, the result is that scientists tend to make their work look as mathematical as possible. In a field like physics this probably doesn't do much harm, but the further you get from the natural sciences, the more of a problem it becomes.

A page of formulas just looks so impressive. (Tip: for extra impressiveness, use Greek variables.) And so there is a great temptation to work on problems you can treat formally, rather than problems that are, say, important.

If hackers identified with other makers, like writers and painters, they wouldn't feel tempted to do this. Writers and painters don't suffer from math envy. They feel as if they're doing something completely unrelated. So are hackers, I think.



If universities and research labs keep hackers from doing the kind of work they want to do, perhaps the place for them is in companies. Unfortunately, most companies won't let hackers do what they want either. Universities and research labs force hackers to be scientists, and companies force them to be engineers.

I only discovered this myself quite recently. When Yahoo bought Viaweb, they asked me what I wanted to do. I had never liked the business side very much, and said that I just wanted to hack. When I got to Yahoo, I found that what hacking meant to them was implementing software, not designing it. Programmers were seen as technicians who translated the visions (if that is the word) of product managers into code.

This seems to be the default plan in big companies. They do it because it decreases the standard deviation of the outcome. Only a small percentage of hackers can actually design software, and it's hard for the people running a company to pick these out. So instead of entrusting the future of the software to one brilliant hacker, most companies set things up so that it is designed by committee, and the hackers merely implement the design.

If you want to make money at some point, remember this, because this is one of the reasons startups win. Big companies want to decrease the standard deviation of design outcomes because they want to avoid disasters. But when you damp oscillations, you lose the high points as well as the low. This is not a problem for big companies, because they don't win by making great products. Big companies win by sucking less than other big companies.

So if you can figure out a way to get in a design war with a company big enough that its software is designed by product managers, they'll never be able to keep up with you. These opportunities are not easy to find, though. It's hard to engage a big company in a design war, just as it's hard to engage an opponent inside a castle in hand to hand combat. It would be pretty easy to write a better word processor than Microsoft Word, for example, but Microsoft, within the castle of their operating system monopoly, probably wouldn't even notice if you did.

The place to fight design wars is in new markets, where no one has yet managed to establish any fortifications. That's where you can win big by taking the bold approach to design, and having the same people both design and implement the product. Microsoft themselves did this at the start. So did Apple. And Hewlett-Packard. I suspect almost every successful startup has.



So one way to build great software is to start your own startup. There are two problems with this, though. One is that in a startup you have to do so much besides write software. At Viaweb I considered myself lucky if I got to hack a quarter of the time. And the things I had to do the other three quarters of the time ranged from tedious to terrifying. I have a benchmark for this, because I once had to leave a board meeting to have some cavities filled. I remember sitting back in the dentist's chair, waiting for the drill, and feeling like I was on vacation.

The other problem with startups is that there is not much overlap between the kind of software that makes money and the kind that's interesting to write. Programming languages are interesting to write, and Microsoft's first product was one, in fact, but no one will pay for programming languages now. If you want to make money, you tend to be forced to work on problems that are too nasty for anyone to solve for free.

All makers face this problem. Prices are determined by supply and demand, and there is just not as much demand for things that are fun to work on as there is for things that solve the mundane problems of individual customers. Acting in off-Broadway plays just doesn't pay as well as wearing a gorilla suit in someone's booth at a trade show. Writing novels doesn't pay as well as writing ad copy for garbage disposals. And hacking programming languages doesn't pay as well as figuring out how to connect some company's legacy database to their Web server.



I think the answer to this problem, in the case of software, is a concept known to nearly all makers: the day job. This phrase began with musicians, who perform at night. More generally, it means that you have one kind of work you do for money, and another for love.

Nearly all makers have day jobs early in their careers. Painters and writers notoriously do. If you're lucky you can get a day job that's closely related to your real work. Musicians often seem to work in record stores. A hacker working on some programming language or operating system might likewise be able to get a day job using it. [1]

When I say that the answer is for hackers to have day jobs, and work on beautiful software on the side, I'm not proposing this as a new idea. This is what open-source hacking is all about. What I'm saying is that open-source is probably the right model, because it has been independently confirmed by all the other makers.

It seems surprising to me that any employer would be reluctant to let hackers work on open-source projects. At Viaweb, we would have been reluctant to hire anyone who didn't. When we interviewed programmers, the main thing we cared about was what kind of software they wrote in their spare time. You can't do anything really well unless you love it, and if you love to hack you'll inevitably be working on projects of your own. [2]



Because hackers are makers rather than scientists, the right place to look for metaphors is not in the sciences, but among other kinds of makers. What else can painting teach us about hacking?

One thing we can learn, or at least confirm, from the example of painting is how to learn to hack. You learn to paint mostly by doing it. Ditto for hacking. Most hackers don't learn to hack by taking college courses in programming. They learn to hack by writing programs of their own at age thirteen. Even in college classes, you learn to hack mostly by hacking. [3]

Because painters leave a trail of work behind them, you can watch them learn by doing. If you look at the work of a painter in chronological order, you'll find that each painting builds on things that have been learned in previous ones. When there's something in a painting that works very well, you can usually find version 1 of it in a smaller form in some earlier painting.

I think most makers work this way. Writers and architects seem to as well. Maybe it would be good for hackers to act more like painters, and regularly start over from scratch, instead of continuing to work for years on one project, and trying to incorporate all their later ideas as revisions.

The fact that hackers learn to hack by doing it is another sign of how different hacking is from the sciences. Scientists don't learn science by doing it, but by doing labs and problem sets. Scientists start out doing work that's perfect, in the sense that they're just trying to reproduce work someone else has already done for them. Eventually, they get to the point where they can do original work. Whereas hackers, from the start, are doing original work; it's just very bad. So hackers start original, and get good, and scientists start good, and get original.



The other way makers learn is from examples. For a painter, a museum is a reference library of techniques. For hundreds of years it has been part of the traditional education of painters to copy the works of the great masters, because copying forces you to look closely at the way a painting is made.

Writers do this too. Benjamin Franklin learned to write by summarizing the points in the essays of Addison and Steele and then trying to reproduce them. Raymond Chandler did the same thing with detective stories.

Hackers, likewise, can learn to program by looking at good programs-- not just at what they do, but the source code too. One of the less publicized benefits of the open-source movement is that it has made it easier to learn to program. When I learned to program, we had to rely mostly on examples in books. The one big chunk of code available then was Unix, but even this was not open source. Most of the people who read the source read it in illicit photocopies of John Lions' book, which though written in 1977 was not allowed to be published until 1996.



Another example we can take from painting is the way that paintings are created by gradual refinement. Paintings usually begin with a sketch. Gradually the details get filled in. But it is not merely a process of filling in. Sometimes the original plans turn out to be mistaken. Countless paintings, when you look at them in xrays, turn out to have limbs that have been moved or facial features that have been readjusted.

Here's a case where we can learn from painting. I think hacking should work this way too. It's unrealistic to expect that the specifications for a program will be perfect. You're better off if you admit this up front, and write programs in a way that allows specifications to change on the fly.

(The structure of large companies makes this hard for them to do, so here is another place where startups have an advantage.)

Everyone by now presumably knows about the danger of premature optimization. I think we should be just as worried about premature design-- deciding too early what a program should do.

The right tools can help us avoid this danger. A good programming language should, like oil paint, make it easy to change your mind. Dynamic typing is a win here because you don't have to commit to specific data representations up front. But the key to flexibility, I think, is to make the language very abstract. The easiest program to change is one that's very short.



This sounds like a paradox, but a great painting has to be better than it has to be. For example, when Leonardo painted the portrait of Ginevra de Benci in the National Gallery, he put a juniper bush behind her head. In it he carefully painted each individual leaf. Many painters might have thought, this is just something to put in the background to frame her head. No one will look that closely at it.

Not Leonardo. How hard he worked on part of a painting didn't depend at all on how closely he expected anyone to look at it. He was like Michael Jordan. Relentless.

Relentlessness wins because, in the aggregate, unseen details become visible. When people walk by the portrait of Ginevra de Benci, their attention is often immediately arrested by it, even before they look at the label and notice that it says Leonardo da Vinci. All those unseen details combine to produce something that's just stunning, like a thousand barely audible voices all singing in tune.

Great software, likewise, requires a fanatical devotion to beauty. If you look inside good software, you find that parts no one is ever supposed to see are beautiful too. I'm not claiming I write great software, but I know that when it comes to code I behave in a way that would make me eligible for prescription drugs if I approached everyday life the same way. It drives me crazy to see code that's badly indented, or that uses ugly variable names.



If a hacker were a mere implementor, turning a spec into code, then he could just work his way through it from one end to the other like someone digging a ditch. But if the hacker is a creator, we have to take inspiration into account.

In hacking, like painting, work comes in cycles. Sometimes you get excited about some new project and you want to work sixteen hours a day on it. Other times nothing seems interesting.

To do good work you have to take these cycles into account, because they're affected by how you react to them. When you're driving a car with a manual transmission on a hill, you have to back off the clutch sometimes to avoid stalling. Backing off can likewise prevent ambition from stalling. In both painting and hacking there are some tasks that are terrifyingly ambitious, and others that are comfortingly routine. It's a good idea to save some easy tasks for moments when you would otherwise stall.

In hacking, this can literally mean saving up bugs. I like debugging: it's the one time that hacking is as straightforward as people think it is. You have a totally constrained problem, and all you have to do is solve it. Your program is supposed to do x. Instead it does y. Where does it go wrong? You know you're going to win in the end. It's as relaxing as painting a wall.



The example of painting can teach us not only how to manage our own work, but how to work together. A lot of the great art of the past is the work of multiple hands, though there may only be one name on the wall next to it in the museum. Leonardo was an apprentice in the workshop of Verrocchio and painted one of the angels in his Baptism of Christ. This sort of thing was the rule, not the exception. Michelangelo was considered especially dedicated for insisting on painting all the figures on the ceiling of the Sistine Chapel himself.

As far as I know, when painters worked together on a painting, they never worked on the same parts. It was common for the master to paint the principal figures and for assistants to paint the others and the background. But you never had one guy painting over the work of another.

I think this is the right model for collaboration in software too. Don't push it too far. When a piece of code is being hacked by three or four different people, no one of whom really owns it, it will end up being like a common-room. It will tend to feel bleak and abandoned, and accumulate cruft. The right way to collaborate, I think, is to divide projects into sharply defined modules, each with a definite owner, and with interfaces between them that are as carefully designed and, if possible, as articulated as programming languages.



Like painting, most software is intended for a human audience. And so hackers, like painters, must have empathy to do really great work. You have to be able to see things from the user's point of view.

When I was a kid I was always being told to look at things from someone else's point of view. What this always meant in practice was to do what someone else wanted, instead of what I wanted. This of course gave empathy a bad name, and I made a point of not cultivating it.

Boy, was I wrong. It turns out that looking at things from other people's point of view is practically the secret of success. It doesn't necessarily mean being self-sacrificing. Far from it. Understanding how someone else sees things doesn't imply that you'll act in his interest; in some situations-- in war, for example-- you want to do exactly the opposite. [4]

Most makers make things for a human audience. And to engage an audience you have to understand what they need. Nearly all the greatest paintings are paintings of people, for example, because people are what people are interested in.

Empathy is probably the single most important difference between a good hacker and a great one. Some hackers are quite smart, but when it comes to empathy are practically solipsists. It's hard for such people to design great software [5], because they can't see things from the user's point of view.

One way to tell how good people are at empathy is to watch them explain a technical question to someone without a technical background. We probably all know people who, though otherwise smart, are just comically bad at this. If someone asks them at a dinner party what a programming language is, they'll say something like ``Oh, a high-level language is what the compiler uses as input to generate object code.'' High-level language? Compiler? Object code? Someone who doesn't know what a programming language is obviously doesn't know what these things are, either.

Part of what software has to do is explain itself. So to write good software you have to understand how little users understand. They're going to walk up to the software with no preparation, and it had better do what they guess it will, because they're not going to read the manual. The best system I've ever seen in this respect was the original Macintosh, in 1985. It did what software almost never does: it just worked. [6]

Source code, too, should explain itself. If I could get people to remember just one quote about programming, it would be the one at the beginning of Structure and Interpretation of Computer Programs.
Programs should be written for people to read, and only incidentally for machines to execute.
You need to have empathy not just for your users, but for your readers. It's in your interest, because you'll be one of them. Many a hacker has written a program only to find on returning to it six months later that he has no idea how it works. I know several people who've sworn off Perl after such experiences. [7]

Lack of empathy is associated with intelligence, to the point that there is even something of a fashion for it in some places. But I don't think there's any correlation. You can do well in math and the natural sciences without having to learn empathy, and people in these fields tend to be smart, so the two qualities have come to be associated. But there are plenty of dumb people who are bad at empathy too. Just listen to the people who call in with questions on talk shows. They ask whatever it is they're asking in such a roundabout way that the hosts often have to rephrase the question for them.



So, if hacking works like painting and writing, is it as cool? After all, you only get one life. You might as well spend it working on something great.

Unfortunately, the question is hard to answer. There is always a big time lag in prestige. It's like light from a distant star. Painting has prestige now because of great work people did five hundred years ago. At the time, no one thought these paintings were as important as we do today. It would have seemed very odd to people at the time that Federico da Montefeltro, the Duke of Urbino, would one day be known mostly as the guy with the strange nose in a painting by Piero della Francesca.

So while I admit that hacking doesn't seem as cool as painting now, we should remember that painting itself didn't seem as cool in its glory days as it does now.

What we can say with some confidence is that these are the glory days of hacking. In most fields the great work is done early on. The paintings made between 1430 and 1500 are still unsurpassed. Shakespeare appeared just as professional theater was being born, and pushed the medium so far that every playwright since has had to live in his shadow. Albrecht Durer did the same thing with engraving, and Jane Austen with the novel.

Over and over we see the same pattern. A new medium appears, and people are so excited about it that they explore most of its possibilities in the first couple generations. Hacking seems to be in this phase now.

Painting was not, in Leonardo's time, as cool as his work helped make it. How cool hacking turns out to be will depend on what we can do with this new medium. In some ways, the time lag of coolness is an advantage. When you meet someone now who is writing a compiler or hacking a Unix kernel, at least you know they're not just doing it to pick up chicks.




Notes

[1] The greatest damage that photography has done to painting may be the fact that it killed the best day job. Most of the great painters in history supported themselves by painting portraits.

[2] I've been told that Microsoft discourages employees from contributing to open-source projects, even in their spare time. But so many of the best hackers work on open-source projects now that the main effect of this policy may be to ensure that they won't be able to hire any first-rate programmers.

[3] What you learn about programming in college is much like what you learn about books or clothes or dating: what bad taste you had in high school.

[4] Here's an example of applied empathy. At Viaweb, if we couldn't decide between two alternatives, we'd ask, what would our competitors hate most? At one point a competitor added a feature to their software that was basically useless, but since it was one of few they had that we didn't, they made much of it in the trade press. We could have tried to explain that the feature was useless, but we decided it would annoy our competitor more if we just implemented it ourselves, so we hacked together our own version that afternoon.

[5] Except text editors and compilers. Hackers don't need empathy to design these, because they are themselves typical users.

[6] Well, almost. They overshot the available RAM somewhat, causing much inconvenient disk swapping, but this could be fixed within a few months by buying an additional disk drive.

[7] The way to make programs easy to read is not to stuff them with comments. I would take Abelson and Sussman's quote a step further. Programming languages should be designed to express algorithms, and only incidentally to tell computers how to execute them. A good programming language ought to better for explaining software than English. You should only need comments when there is some kind of kludge you need to warn readers about, just as on a road there are only arrows on parts with unexpectedly sharp curves.

Thanks to Trevor Blackwell, Robert Morris, Dan Giffin, and Lisa Randall for reading drafts of this, and to Henry Leitner and Larry Finkelstein for inviting me to speak.


http://www.nynewsday.com/news/nyc-nasa0511,0,1770094.story?coll=nyc-topnews-short-navigation

From Orlando Sentinel


COLUMBIA INVESTIGATION

NASA Paying Civilian Members of Board Probing Shuttle
By Kevin Spear, Jim Leusner and Gwyneth K. Shaw
Sentinel Staff Writers

May 11, 2003

Civilian members of the board investigating the shuttle Columbia disaster -- outsiders who were added to reassure Congress and the public that the board would be fully independent of the space agency -- are actually being paid executive-level salaries by NASA.

The agency quietly put the five civilians on the National Aeronautics and Space Administration payroll, at pay rates of $134,000 a year, in order to take advantage of provisions that allow boards composed exclusively of "federal employees" to conduct their business in secret.

If the civilians had not been hired by NASA, a federal law would have required the investigating board to meet publicly, justify any closed-door sessions and keep transcripts and minutes that would ultimately become public records.

Each of the 13 board members is now classified as a federal employee. Besides the five civilians and chairman, other members include four active-duty military officers, two federal transportation officials and a NASA executive. And as a result, the board says it is legally permitted to meet in secret and promise "confidentiality" to NASA employees and others among the more than 200 individuals it has interviewed.

Last Tuesday, board Chairman Harold Gehman Jr. said that transcripts of these interviews will be kept secret from the general public, and even from Congress. Said Gehman, a retired Navy admiral who is being paid at the rate of $142,500 per year, "Those are never going to see the light of day."

Gehman, in a prepared statement Friday night, said the board's motive was not to withhold information from the public. He added: "The board determined it could provide a much deeper and richer review of NASA policies and procedures if it employed standard safety investigation procedures, which are incompatible with [open-government] provisions."

The statement did not respond to a question about his pay.

Gehman's insistence on confidentiality has rankled members of Congress, who say the board's report -- now expected in July -- must be accompanied by the documents that drove the conclusions. And public-policy critics say the salaries call into question whether the board is truly independent from the agency it is investigating.

"Three words -- conflict of interest," said Steven Aftergood, who heads the Project on Government Secrecy at the Federation of American Scientists. "The upshot is, we don't have an independent investigating board. This means NASA is investigating itself. This defeats the whole purpose of having an independent inquiry.

"What they did was hire outsiders and convert them into an internal board. It's just baffling."

Each of Gehman's five civilian board members insists that accepting money from NASA has in no way compromised the investigation. And indeed, board members and Gehman have been publicly critical of the space agency and its management "culture," questioning whether it has paid adequate attention to maintenance of the aging shuttle fleet and tolerated potentially unsafe conditions.

But one of those five, former astronaut Sally Ride, acknowledges that the public may see the board differently.

"I don't see it an issue for the Board members to be on the federal payroll -- this board, unlike most pro-bono government committees, is essentially a full-time job (for which people should receive some compensation)," Ride wrote in an e-mail to the Orlando Sentinel last week. "But one might ask whether it should be NASA's payroll."

But Ride added that President Bush did not step forward to appoint a special commission, as did President Reagan when the shuttle Challenger disintegrated in 1986.

"Since the White House hasn't picked up the mantle on this investigation, but rather has left it to NASA, I don't see an alternative payroll source -- or alternative source for funding the investigation itself," Ride wrote.

Still, the combination of the board operating in secret, with members being paid as much as $2,500 a week by NASA, may heighten a controversy that began more than three months ago when NASA Administrator Sean O'Keefe announced he was establishing the board.

"When you're investigating a tragedy of this magnitude, the only way to restore credibility is to be open about the investigation," said Jane Kirtley, a University of Minnesota media ethics and law professor, former executive director for the Reporters Committee for Freedom of the Press and open-government advocate.

Grumbling starts early

The Columbia Accident Investigation Board, as it is now known, was controversial almost from the moment of its creation: 10:30 a.m. EST on Feb. 1, 90 minutes after the shuttle Columbia disintegrated 200,000 feet above Texas, killing its crew of seven.

That's when O'Keefe activated the "Space Shuttle Mishap Interagency Investigation Board," part of a contingency plan created by NASA in 1995. The seven-member board was to consist of four top military aviation and safety officials, two civilians from the Federal Aviation Administration and Department of Transportation, and a NASA employee. O'Keefe announced on Feb. 2 the addition of Gehman as its chairman.

The 1995 plan spelled out how the board would operate. Related NASA procedures include granting a "privilege" of confidentiality to witnesses, meeting in private and seeking the cause of an accident -- not whom to blame. It was modeled after accident safety investigation procedures used by the military.

"If you really want to get to the bottom of an accident and its causes, you want a witness to feel like what they say is not going to be in a headline or a court of law," said Bryan O'Connor, a former astronaut who is now head of NASA's Office of Safety and Mission Assurance. "You want them to know it will only be used to prevent future mishaps, not to fire them."

O'Keefe's announcement came under immediate criticism, not because of the board's secrecy provisions but because of concerns it was too closely tied to NASA.

"The fact of the matter is, so far the commission that's there now was appointed by NASA, is staffed by NASA and reports back to NASA, and I'm afraid that's just not going to be credible," said Rep. Bart Gordon of Tennessee, the ranking Democrat on the House Science Committee's Space subcommittee, in comments echoed by members of both parties.

O'Keefe acted quickly, changing the board's charter so that it no longer reported directly to him. Then, Gehman began adding other civilian board members: first, Roger Tetrault, a retired military and energy services contractor executive, and Sheila Widnall, a former Air Force secretary and current Massachusetts Institute of Technology professor, by mid-February; and finally Douglas Osheroff, a Stanford University Nobel laureate; John Logsdon, director of the Space Policy Institute at George Washington University; and former astronaut Ride, now a professor on leave from the University of California at San Diego, on March 5.

NASA "has taken the necessary steps," O'Keefe told a congressional committee on Feb. 27, "to ensure the board's complete independence."

Public wasn't informed

But there was something that O'Keefe didn't tell Congress: what was being done to ensure the board would be able to conduct the military-style investigation envisioned in the NASA policy. Gehman was initially the board's only civilian appointee. He was put on the government's payroll Feb. 2 in the Office of Personnel Management, which calls itself "Federal Government's Human Resources Agency."

And Gehman, who chaired a military investigation into the 2000 terrorist bombing of the destroyer USS Cole, wanted the Columbia board to operate the same way. So every subsequent civilian appointee was put on NASA's payroll, again without any public mention.

The reason was an obscure law called the Federal Advisory Committee Act, which requires appointed boards and commissions to publicly advertise their meetings, whether open or closed; keep minutes of all sessions; and generally make their records available to the public.

But boards are exempt from that act if they are made up of full-time federal employees. And from the early days of the board, Gehman has acknowledged, he didn't want to operate under the act's rules.

That meant making sure every board member was a "federal employee." So NASA, dipping into a special $50 million congressional appropriation to fund the investigation into the Columbia tragedy, gave each of the new members a "NASA excepted service appointment" for up to one year, at a salary rate of $134,000, or about $500 per day worked.

Tetrault, who retired in 2000 as chairman of McDermott International Inc., said he initially offered to work for free -- but was told he had to be paid.

"As I recall, we had to be designated as a Safety Investigation to preserve witness privilege, which we thought was essential to getting individuals to open up to us," he wrote in an e-mail response to the Sentinel last week.

Eric Glitzenstein, a Washington, D.C., public-interest lawyer who has won several suits involving challenges to the advisory committee act, said the action "seems to be an obvious effort to subvert the [FACA] statute.

"You don't point to a board of full-time federal employees and say you're having an independent review, " he said. "What's problematic is not that they are receiving compensation. It's that they used that to get around the statute -- and public accountability."

NASA, however, insists that the decision to make board members government employees was merely to enable the board to carry out the 1995 NASA contingency plan.

In a statement issued by NASA chief spokesman Glenn Mahone on Friday, the agency declared: "Ignoring a fully functional federal employment structure seemed neither a timely or desirable solution -- especially in the face of a validated contingency plan for just such an emergent situation."

Differences over pay

It's not unusual for some government boards to pay their members. The panel set up as the National Commission on Terrorist Attacks Upon the United States, also known as the 9-11 Commission, provides an option of $134,000-a-year pay rates for its 10 members, plus expenses. A spokesman said members are not full-time employees and some decline the pay.

The legislation that set up that panel exempts it from FACA and allows private hearings, but it also requires it to hold public hearings and question witnesses under oath. The reason for the exemption: It's expected that the commission will interview CIA and FBI officials about national-security issues.

But many see the Gehman board as more comparable to the presidential commission ordered by then-President Reagan in 1986, when the shuttle Challenger broke up 73 seconds after liftoff.

The first sentence of Executive Order 12546 stated the Challenger panel was subject to the Federal Advisory Committee Act. It said the 11 civilian members "shall serve without compensation for their work on the Commission," other than travel expenses.

Eugene Covert, who headed MIT's aeronautics and astronautics department when he served on the commission, said he thinks taking a salary "would tend to bias what [board members] do."

"I just think in general, pro bono work should be pro bono work."

Unlike the Gehman board, the Challenger commission operated under a presidential charter, with high-profile chairman William Rogers who'd been secretary of state and attorney general. It held a series of public hearings at which NASA executives were questioned under oath about their decisions to allow the shuttle to fly despite known failures in the O-rings in the solid rocket boosters.

In the end, it won praise for exposing flaws in the solid rocket boosters -- and inadequate safety practices by NASA's management.

Challenger commission member Robert Hotz, the retired editor of Aviation Week, said the commission did most of its work in public -- with sworn testimony -- forcing witnesses to either be uncomfortably honest or commit perjury.

"And they chose to be uncomfortable," Hotz said. "Secret testimony is bull---- in an accident investigation. Space is not an in-house thing. It's a public thing, a non-military thing."

In fact, most of the Challenger Commission files -- a total of 108 feet -- are available to the public at the National Archives. Only the equivalent of four small boxes are exempt under national-security and privacy grounds, said Steve Tilley, chief of special access at the National Archives and Records Administration.

Aftergood, of the Federation of American Scientists, said he could not imagine one of NASA's toughest critics on the Challenger commission, the late Nobel physicist Richard Feynman, being asked to become an agency employee before joining the board.

"It's as if all of the members of the 9-11 Commission were hired as employees of the CIA," Aftergood said. "Their credibility would be totally shot if that happened. It would be outrageous and laughable."

Panel wins praise

Members of the Columbia board reject any implication that taking money from NASA has compromised them in any way.

"I could care less whether I am a government employee," Widnall wrote in an e-mail to the Sentinel last week. "I think it is just a matter of convenience. I do not compromise my independence in any way. I'm way beyond that -- anyone who knows me would agree."

Added Tetrault: "It is certainly my belief that we have acted independently from NASA and on occasion have been very critical of some of their practices."

In fact, no one has accused the board of pulling its punches and members of Congress say they have been impressed with the panel's work and openness. On the other hand, the board has done very little work in public. It has had 13 media briefings, but only nine public hearings, all of which featured presentations by experts who are asked to "affirm" -- but not swear to -- their truthfulness.

The board has yet to publicly interview any senior NASA shuttle managers who were directly involved in making key decisions during the Columbia mission. Most of the time, the board and its staff work out of a Houston office building, or travel in small groups to various NASA facilities.

Board members, and investigators working for them, have interviewed more than 200 people. And led by Gehman, who has spoken out on the subject several times, they say confidentiality during those interviews is essential to candor.

"Certainly you have to appreciate the fact that people working for NASA will not feel comfortable making critical statements about their employer unless these are privileged," Osheroff said in a Sentinel interview last week. "This allows us as a board to gain much more insight into how NASA functions."

And Ride, who served on the Challenger commission, said it's not the money nor the secrecy that troubles her -- but what the public might think about the appearance of working for NASA.

"As you've already figured out, this is a pretty independent-minded and stubborn group of people on this Board, so the investigation won't be compromised," she wrote.

"But a vigorous response to the report would be better assured if the report were 'owned' by Congress or the White House."

Gehman has said the board's report will likely be released sometime in July, so Congress can read it during its August recess. That could produce a lively set of hearings this fall over how the board came to its conclusions.

Rep. Gordon, a persistent NASA critic, thought he had extracted a promise from Gehman earlier this month to provide Congress with transcripts of all interviews conducted by the board, minus only the names of the witnesses.

Last week, Gehman reversed himself according to Gordon and several other committee sources. Congressional officials say they are still hopeful in getting information they are seeking -- in some form.

"Clearly, our committee, to do our job properly, has got to have this information," Gordon said. "Otherwise, all we are going to do is have their final results without having the internal data to see where it's a logical result. Certainly there are appropriate confidentialities that ought to be protected, but that can be done consistent with providing this information."

He and others cite as a model the National Transportation Safety Board, which investigates major airplane, train and bus accidents. The agency holds all of its hearings in public; takes testimony under oath; and releases transcripts or summaries of interviews, though omitting the names of eyewitnesses.

"The really accurate model is the National Transportation Safety Board," Gordon said. "Why should we not have access to this kind of information? The 'everybody does it' argument is simply not accurate and not valid."

Gehman, however, is adamant about confidentiality. "There is a long, rich history between the executive branch and the legislative branch about accident investigations," he said in a brief interview in Houston after the board's media briefing last Tuesday, adding that the U.S. Supreme Court has upheld grants of confidentiality by the military in accident safety investigations.

"We are a member of the executive branch, and we will do whatever the protocol requires for that. But my offer [to Congress] does not include looking at privileged witness statements."

U.S. Rep. Dana Rohrabacher, a California Republican who chairs the House Science Committee's Space subcommittee, said he is confident both sides can reach agreement.

"There's no need for there to be a turf battle here between the executive and legislative branches," Rohrabacher said. "We respect Adm. Gehman; he's got our confidence, and any type of a fight over executive privilege to keep certain information from us will do nothing but destroy his and the whole commission's credibility."

Robyn Suriano contributed to this story. Kevin Spear may be reached at 407-420-5062 or kspear@orlandosentinel.com. Jim Leusner can be reached at 407-420-5411 or jleusner@orlandosentinel.com. Gwyneth K. Shaw can be reached at 202-824-8229 or gshaw@orlandosentinel.com.




Copyright © 2003, Orlando Sentinel