Category Archives: Search

Breaking: Wolfram Alpha now useful

If you didn’t think Wolfram Alpha was useful before, here’s your proof that it can answer everday queries that everyone cares about:

Snicker.

Have you found any other amusing queries?

Real-time Twitter search: cool, but no Google threat

In a recent blog post by industry veteran Dave Winer, he opines that Google is falling behind by not paying attention to the search opportunities opened up by Twitter and other microblogging services.  His point boils down to:

Once Twitter is delivering the news search that Google can’t, it will be way too late. This is probably what the Google management doesn’t understand because they aren’t using Twitter themselves.

C’mon, Dave.  Someone’s been drinking the Twitter Kool-Aid for too long now.  Twitter is not the answer to everything.  Let’s get off the more-plugged-in-than-thou soapbox and look at real value propositions.

First of all, “news search” is not something Google is focused on because that’s not how they make their money.  If someone else were to come along and make a much better product than Google News (and one might argue several companies already do), Google’s management and shareholders wouldn’t notice a thing. 

Second, I’m doubtful (as is Valleywag) that Twitter could ever provide a usable news service.  When major events happen, there may be witnesses who are Twittering what they see, but how can Twitter parse the useful, factual tweets out of the millions of related tweets? It’s an impossible task.  The best Twitter can do is say “a plane crashed”, but they’d be hard pressed to say which tweets are authoritative.  (Google Trends tracks memes like this already.)  

To complicate Twitter’s job, spammers are already exploiting Twitter hashtags, so as soon as that #planecrash meme gets momentum and people start watching live Twitter search results for details, I guarantee a good number of those posts will look like “#planecrash Buy viagra here! http://supersmallurl.com/blah.”

Lastly, news searches probably account for less than 5% of searches across the web, and “real-time” news search surely represents far less than 1%.  Google can afford to ignore this segment because they rock at the other 99% of searches, typically for more mundane topics like “cheap digital camera” or “paris hilton nude pics.”  And those digital camera searches will monetize much more effectively than news about the latest plane crash.  

Twitter is a great trend- or meme-tracking tool, but it will never be a real news source, and even if it is, Google won’t care, nor should they.

Google Squared not a Wolfram killer, will kill PriceGrabber instead

Despite the over-hyped headlines over at TechCrunch, Wolfram Alpha and Google Squared will not compete in the same space.  Wolfram is for scientists and researchers, and Google Squared will be suited for real end-users.  Squared is built to compare large sets of “things” (dog breeds, roller coasters, digital cameras) that have specific, machine parseable metadata, while Wolfram has a human-curated database that is built to give deep, specific data about one thing.

Here are a few queries I think Wolfram will answer, but Google Squared wouldn’t:

  • France’s GDP (graphed over time)
  • Bank of America stock price (graphed over time)
  • Where is the International Space Station right now?

All of these queries can be answered by Wolfram because it has a rich set of data around each of these objects.  However, Google Squared seems to be tailored to help you compare objects making it suitable for queries like:

  • Finding digital cameras with X megapixels or Y focal length
  • Finding a USB hub with more than 5 ports
  • Finding a dog breed less than 15 pounds suitable for people with allergies
  • Finding a 50″-56″ LCD with at least 3 HDMI outs, a contrast ratio of > 1000:1, and a three-year warranty

The two products serve two markets, and should not be compared.  In essence, Wolfram will give you information about a thing, Google Squared will help you find that thing.

For the record, Google’s approach is much more technically impressive, and, I’d say, something of a Holy Grail of web spiders.  This is the first time a general web spider has been designed to actually figure out metadata about things that it finds on the web so that those things can be categorized and compared against each other.  Wolfram relies on human data entry to make sure it has this level of intelligence, but Google Squared is imitating human intelligence to automate the process.

Google Squared is certain to be integrated with Google’s Shopping Search in the near future, making it a serious competitor to PriceGrabber and other price comparison engines, because it will let users filter and sort by even more fields than they could before.  If you watch the video on the TechCrunch article, just imagine a column on the right side with prices.  Watch out, PriceGrabber. 

Wolfram Alpha is a feature, not a search engine

Update 4/29, 5:21pm: Wolfram updated his blog today and linked to his demo video, and the product does look as niche as I feared.  It is “smart answers” on steriods, and while it may complement regular search results nicely, it’s not moving the field of Internet search and indexing forward at all.  Perhaps it’s the press’s fault for pumping up Alpha as the next Google – clearly it is not, nor are they trying to be.  They’re tackling a relatively small problem (compared to indexing the entire Internet) and they appear to be targeting a small audience (academics and scientists), so we should probably stop discussing Alpha in the same breath as Google, Yahoo, and the rest.  Please move along, nothing to see here.

Much ado has been made lately about Wolfram Alpha, a new-fangled “search engine” due to release in May that promises to give answers to questions that are asked in plain English.  Predictably, it’s much ado about nothing.  Techcrunch responded today to leaked screenshots by sitting on both sides of the fence, saying it’s unlikely Wolfram has “something Google doesn’t or can’t build in a year,” while also saying that their own guest editor’s predictions of Wolfram’s search greatness are “persuasive.”  Which is it, guys?

Let me boil it down for you based on what I’ve read so far: Wolfram Alpha’s pitch is that their search engine is built to answer plain English “computational” questions, i.e. questions that have specific answers that can be calculated.  To do this, they are sucking in all the databases they can find – population stats, weather stats, census data, geographic data, and any other corpora that are readily available.  

Once they have all the data compiled, they make it mine-able using plain English queries.  In his TechCrunch guest article, Nova Spivack gives three sample queries that are supposed to show the awesome potential of Alpha:

  • What country is Timbuktu in?
  • How many protons are in a hydrogen atom?
  • What is the average rainfall in Seattle?

It’s great that Alpha can answer these, but did Spivack bother to try these queries in Google?  Google gives an answer to every single one in the summary of the top result.  I didn’t even have to click through.  Hopefully these are just shoddy examples from Spivack rather than an example of how lame Alpha actually is.

Here are a few query types I’m hoping Alpha can answer that Google cannot:

  • What was Bank of America’s stock price at close on September 11, 2001?
  • Is next year a leap year?
  • How have home prices in San Diego, CA changed in the last 5 years?

These are questions that have specific answers that can be calculated from readily available data, but (here’s the key) are unlikely to have been written about on the web in a way that would make them findable by Google.  These questions are so specific (long-tail), that Google just won’t have answers sitting around in its index.

I can hear your next question already: if Wolfram Alpha is only good for such long-tail questions, how can it possibly compete with Google?  The answer is: it can’t.  

For all the glowing talk, Alpha appears to be a large set of regular expressions that parse natural language so users can mine a massive database.  This is not dissimilar to how Ask.com worked in the late 90’s when they had a huge, human-built database of question templates allowing them to parse queries and provide links as answers.  Remember how well that worked?

Alpha must be an acquisition play.  They must be developing this answer engine with an eye towards selling it to one of the big players (Google, Yahoo, MS, or even Ask.com) so they can beef up their search results.  All of the majors already have smart answers features (a la Ask.com) that give exact answers to a small set of templatic questions, so acquiring Alpha would make an existing smart answers feature more robust.

TechCrunch reported that an Alpha “insider” today leaked the screenshow below in an attempt to show how Alpha is so much cooler than Google’s smart answers:

Wolfram Alpha Leaked Screenshot

Wolfram Alpha Leaked Screenshot

On the left, we have Alpha’s result for a search on “ISS.”  The result it gives is a map and technical details of the Internation Space Station’s orbit.  Wow.  That is a truly horrible result.  Why anyone would leak this to show the power of the engine is beyond me.  Here’s what’s wrong with it:

  • Who says I want information about the International Space Station?  Maybe I wanted Internet Security Systems or International Schools Services or info about the company ISS A/S out of Copenhagen.  How about a little disambiguation guys?  Clearly Alpha is not trying to be a comprehensive search engine. 
  • Who wants data like that?  If I want info about the International Space Station, I would probably rather see its homepage than some crazy technical data about where it is right this second.
  • What happened to the natural language queries, eh?  Showing that your engine can figure out what I meant by a search on “ISS” hardly shows any natural language parsing ability, and conversely shows the complete lack of disambiguation as I discussed above.
  • Lastly a nitpicky Product Manager thing: at the top, it says that International Space Station is the “input interpretation” of ISS.  “Input interpretation”?  Really?  How many users would have any idea what you’re trying to say there?  This is a product made by nerds for nerds.  I’m a nerd, so I can say this.

On the right, we see Google giving a fantastic answer to the query “maine population”.  (I’ll assume that someone changed the text in the query box to read “california population” after looking up Maine first.)  Google 1, Alpha 0.

Ultimately, Wolfram Alpha is not a search engine, but rather a data mining language for answers about a relatively small set of known entities.  If you want to know about the International School Services, use Google.  If you want to know where Timbuktu is, use Google.  If you want to know the inclination and orbital period of the International Space Station, then by all means, go ahead and use Wolfram Alpha. (Note: Google gives a pretty good result when searching for “International Space Station Inclination.”)  When Alpha finally sells to one of the majors, it will mostly likely settle in as a feature of a search engine, not a search engine itself.

Caveat: all conclusions I’ve drawn are from the information available now.  We’ll see what it’s actually capable of when it launches in the coming weeks.

The Google Paradox

How do you manage a business when the very thing that makes money for you hurts your profits at the same time? This is the dilemma that Google has been struggling with several years now, and it’s not likely to go away soon.

Imagine if you will that you are a content publisher, say, an online magazine. Now imagine that you have your “A” articles that your writers spent a lot of time on and that are very in-depth, and that your readers love. Now imagine that you have your “C” articles that aren’t so good, but because they suck, your readers are much more likely to click on the ads on those articles. Believe it or not, if a page has lousy content, users are indeed much more likely to click on an ad. I’ve seen seen sites with lousy content get up to a 12% clickthrough as compared to high-quality content that gets 1-2% CTR.

Now imagine that you get paid per click.

Which articles would you promote at the front of your magazine? A article or C articles? The A’s are great, but they just don’t pay the bills, let alone pay for themselves. The C’s pay well, but get no repeat traffic.This is exactly the paradox that Google faces every day. They can send their users to quality sites with great articles like Creating Passionate Users or uncov, or they can send traffic to sites with questionable content where users are more likely to click on ads because there’s nothing else to do like Associated Content or Squidoo.

Google runs their AdSense ads on virtually every content site on the web these days, so they make money when they send their traffic to the sites that have the best clickthrough rates.

So what is Google to do? They should certainly try to avoid any perception of impropriety. Apparently they’ve started penalizing pages on Squidoo in order to do just that. But going forward, how do they draw the line? How do they look impartial? How do they assure us that the brick wall between editorial (in their case, search) and sales that every good publisher must have is still in place? How do we know that they’re sending us to the best sites and not the ones that make the most money for them?

The answer is, we probably won’t know until someone starts giving better search results than Google.

Mahalo: No Thank-You

MahaloIf you’re old enough, you may remember a little web directory called Yahoo. Yes, back before it was a search engine in the current sense of the word, and long before you could “google� anything besides your Math 1B T.A. Yahoo’s web directory was assembled by “surfers� who theoretically spent the day scouring the web for new and interesting web pages, and when they found them, they’d add them into an enormous taxonomy Yahoo created, so people could find them easily. It was great, and it made a lot of sense. Back then.

If you were paying attention to your history, you may remember that when Yahoo’s directory and its taxonomy became too big for human editors to maintain and human users to navigate, Yahoo’s search proved to be much more useful than their directory, so people stopped using the directory. Eventually, the directory even disappeared from Yahoo’s home page entirely. If directories were so great, we’d all be using dmoz instead of Google today.

The problem with directories is that they can’t live up to their promise if they’re only maintained by a small set of internal people. A Web Directory promises to be:

  • Comprehensive
  • Authoritative
  • Current

A small set of editors simply cannot keep on top of enough subjects on the web to keep this promise. They will either:

  • Not cover enough topics to satisfy users
  • Not have enough expertise to recommend the best links
  • Not be up-to-date

And most likely, all three of these will be true. It’s the nature of the beast.

And this makes me (and uncov) wonder why Mahalo.com has been brought into existence. Despite describing itself as “human-powered search�, it really is just a directory, just like the old Yahoo directory, like Dmoz, like the old Zeal.com, like the old Looksmart, or a dozen other directories that failed. Note the similarities in the screenshots of Mahalo and Yahoo circa 1997 below. At least Yahoo works in Firefox…

Mahalo will never be really important because it will never be comprehensive enough, it will never be authoritative enough, and it will never be current enough. They claim that their editors will cover tens of thousands of topics in the coming years – 25,000 by 2008 — but what good will that do? How many unique search queries do you think Google gets in a day? I’d bet it’s in the millions, and there’s no way Mahalo will ever cover millions of search queries in an authoritative manner. The long tail is what has made Google successful, and anyone who tries to compete in the search space has to serve the long tail just as faithfully as the most popular search terms, or people just won’t rely on it.

Mahalo claims that they let users suggest links for inclusion in the directory, but unless those links get automatically processed and vetted by the masses, why should we trust their editors to choose which links are useful and which are not? Can their team of 40 editors really know what the best links are for tens of thousands of topics? Despite TechCrunch’s positive spin job, I don’t think so.

Where have the lessons of Web 2.0 gone? While many Web 2.0 sites are just Ajaxy hype, the really good things to come out of this generation of web development are sites that take advantage of collective wisdom, but Mahalo has ignored this lesson and is trying to re-hash a model that failed a decade ago.

I mean Mahalo no ill will, but good luck, guys. And very smart move partnering with Techcrunch on the TechCrunch20 conference.


Yahoo Directory

 

Mahalo directory

The Future of Search is Social

Over the past ten years we have witnessed an evolution in web search. The first-generation search engines like AltaVista, Excite, and Yahoo all indexed the web and gave back results primarily based on the words that were on a web page. If you searched for “lemurs�, these engines would look for pages that had the word “lemur� on them, and return those to you.

This was all well and good until the spammers came. It wasn’t long before the spammers figured out that if they stuffed a page full of the phrase “lemur�, the search engines would send people who searched for lemurs to that page. So, if these spammers happened to be selling lemurs, they could use this method to drive a lot of people to their store even if that store had no information about lemurs or the lemurs they sold weren’t very good. They could get traffic if they just had the word “lemur� on their page enough times (this example is simplified for illustrative purposes.)

Then along came second-generation search: Google. Google’s search was smarter because it looked at pages that linked to pages. If you owned a site about lemurs, Google would scan your site and know that it was about lemurs, but it would also look at other sites, and if they linked to your site with the word “lemurs� in the link, Google would figure that your site about lemurs was pretty important, so it should show up high in a Google search for lemurs.

Again, this was fine and dandy until the spammers figured it out. The early spammers just set up a lot of cheap sites with links to their main site, and built authority in Google’s index that way. Over time, they had to get smarter, so they set up link exchanges among reputable sites, or started buying text links on reputable sites.

Then the spammers set up companies like PayPerPost.com that pay people to write something (anything) about lemurs and link to discountlemurs.com in their blogs. While any human would read these blogs and dismiss them as marketing hooey, to Google’s algorithm these blogs look perfectly valid and they lend authority to discountlemurs.com. Of course, these ersatz bloggers are actually just shills, writing marketing copy for a living – their blogs don’t get much traffic (or they’d have real advertisers), but by posting articles that look real, Google’s algorithm is fooled into thinking the sites they point to (discountlemurs.com) have some authority.

What people outside of the Search industry don’t realize is that Search is hitting a brick wall. The second-generation algorithms, including Google, are constantly struggling to stay one step ahead of the spammers. Just read through a few of the Search Industry sites Webmasterworld.com, SearchEngineRoundup, SearchEngineWatch, SearchEngineLand, and you’ll see the trends soon enough. Google has to update their algorithm all the time to combat spammers, and it’s hard to say who’s winning.

My bet is on the spammers for one simple reason: people are still smarter than computers. If someone can program a search engine to give authority to webpages that match a certain criteria, someone else can figure out how to simulate those criteria. Only recently have computers started to beat the chess masters, and that’s a game with simple rules; the search optimization industry has no rules, so it will be a long time before computers can surpass human judgment when it comes to determining which sites are really important..

This is why I believe it is just a matter of time until Search becomes social, and Google Search starts to fade away. Some may argue that Google’s weighting of links from other sites is already social because those links are placed by people, but we’ve seen that this linking can easily be gamed and/or automated. A truly social search is one that takes user trends and preferences and uses those to tailor its results. It can look at user click behavior, or it can look at the ratings that users give to various sites, or even better a combination of the two. It is true that this can be gamed, but if one puts the proper requirements and safeguards in place, abuse should be relatively easy to detect. If done correctly, it will be too costly for marketers to pay enough people to promote a site, or for someone to create enough bogus accounts to sway the indexing in their favor, and if a site somehow does promote itself artificially, the masses will vote it down immediately, promoting the sites they think are truly important. This is the only way for Search to evolve.

We’re already seeing versions of social search in smaller applications like Digg, Flickr, Delicious, Yelp, and other sites. It’s just a matter of time until someone adapts these techniques to Search itself, and unseats the Google’s algorithm. Ask.com sure isn’t going to do it – who will?

Why Google doesn’t care about Search

GoogleOver the past couple of years, it has become increasingly clear that Google is no longer in the search business. Sure, Google.com is a search engine, but the real business at Google is no longer to provide the best search engine. Its mission is no longer “to organize the world’s information and make it universally accessible and useful.â€? Google has become a pure advertising network.

First, let’s look at their history. Google search was extremely innovative for its time and there’s little argument that it delivered the best results. But Sergey and Larry just couldn’t make any money with search until they plugged in a brilliant system to display ads alongside their search results. In a world that had been dominated by “CPM� ads that gave advertisers no guarantees of a return on their advertising dollar, AdWords required that advertisers only pay for clicks, and it weighted ads based on their performance so users saw the most relevant ads. AdWords revolutionized online advertising just as Google had revolutionized online search.

So what does a company do when they have such a brilliant ad serving tool? They syndicate it. Google released AdSense to let other websites benefit from the performance-based ads that were already doing so well on Google.com, and they took a cut of every click purchased through their system. Another brilliant move.

The ads then spread to Gmail. A few paranoids complained of privacy invasion, but eventually everyone got used to ads next to their (free) email, and everything went back to normal. Google had yet another channel in which to deliver their ads, and the cash flowed in faster than their Money-Counting department could keep track of it.

So what do you do when you have more money than you know what to do with, and your founders are true tech geeks? You develop or acquire companies that do anything you consider “cool.� Have you seen the list of products that Google offers lately? If not, check it out here. Item after item, most of these make no money and have no plans of making money. Gtalk, reader, catalog search, notebook, co-op, code, calendar, docs & spreadsheets… and don’t get me started on YouTube.

But what a lot of people don’t see through the plethora of product releases and purchases is that Google’s only real business is ad serving, and they’re aggressively moving to dominate (I stop short of the word “monopolize� here) online advertising and expand their control of ad delivery into other mediums. Google is adding pay-per-action ads to AdWords which is surely devastating news to other CPA affiliate programs like Commission Junction. With their recent purchase of display advertising market leader DoubleClick, Google has gobbled up even more territory in the online advertising landscape, and now they’re moving to do the same thing by extending their AdWords architecture offline to TV ads, radio ads, and newspaper ads (so far with little success.)

One might ask, “Why is a search engine getting into TV, radio, and newspaper ads?� The answer is this: Google is no longer a search engine. They are an advertising network. Google search is simply a distribution channel for their advertising platform, just like the thousands of other sites that use AdSense. The other Google projects like Gmail, Reader, Maps, and the rest, are all either channels for Google advertising or tools to build brand loyalty.

Is this a Bad Thing? No, of course not. There’s nothing more American than trailblazing and profiting from it. But Google’s mission simply is no longer to organize the world’s information. Selling ads on TV networks, in papers, and on the radio surely has little to do with organizing information. Google’s real, updated mission is to provide the best value to advertisers and the best experience to consumers. This is a noble or at least honorable mission in itself, if not as philanthropic is their original one.

And yes, this means that all those brilliant MBAs and PhDs they’ve shipped in to Sunnyvale are pretty much just working either on new ways to deliver advertising or new venues on which to serve it. If you could figure out a way to make PhDs return 100x their salary as revenue, you’d hire them, too.

It’s a very smart move, really. Google surely realizes that their search may not always dominate. For the past few years it seems that their only updates have been to defeat Google-spammers and aggressive SEO techniques. They know someone will beat their search eventually, particularly as social search gets better. But Google knows that it doesn’t matter. They aren’t in the Search business. As long as they control the delivery of advertising, they make money. Every company they buy and every technology they develop, is either directly tied to creating ad inventory or just building loyalty to Google.

Mind you, I don’t fault Google for any of this. A company has to make money, particularly a public one. But let’s just call an ad network an ad network.

Payperpost.com to challenge Google

PayperpostJust when you thought the Internet couldn’t become any more cynical, along comes Payperpost.com, a site that pays “professional bloggers” to pimp products and services. Here’s the deal: if you have a blog, you can pick from a list of products and services to write about, and each one has a bounty that you will earn once your article is posted and approved by the company that’s paying for it.

For those of you who don’t have a background in journalism, there is supposed to be a huge, big, massive, ginormous brick wall between Editorial and Sales. That means that editorial should never be influenced by the people who pay the publication, but Payperpost flies right in the face of convention and does exactly that.

After running along completely without ethics for their first few months, they have now added a disclosure requirement thanks to FTC regulations, so their bloggers have to disclose that they were paid to post their articles. Okay, so it’s okay to be a slut as long as you disclose that you’re a slut, I suppose.

What really interests me, however, is the possible implications for search engines, and in particular, Google, who put a lot of weight on inbound links. A key point to think about here is that despite their happy, shiny marketing copy Payperpost will not be used by professional bloggers. A real blogger lives and dies by her reputation for honesty and impartiality, so they could never afford to put a disclosure on their site saying that they were paid for an article – their reputation would be shot, and it would be all over. Payperpost isn’t hiring professional bloggers, they’re hiring paid bloggers.

Sluts

So, what good does it do to hire someone to write about your product if their blog isn’t a big, popular one? I’ll tell you: if you hire enough of them, the collective weight of their links to your site will give you more weight in Google. Although it isn’t written anywhere on their site, Payperpost’s purpose seems to be to help companies increase their Google rankings, so they can drive cost-effective traffic to their sites.

With dozens or hundreds of bloggers writing about your product, it shouldn’t be too hard to build a high relevancy rating at Google. Just tell the bloggers to link to your site using the text “Green widgets” and to talk about green widgets in their articles a lot, and before long you’ll show up on Google when people search for “green widgets.”

So what is a search engine to do? Google can try to give less weight to these paid blogs, but it’s essentially impossible for a machine-driven search engine to tell which blogs are paid and which aren’t, so Google will doubtlessly be fooled, and may end up full of spammy links thanks to little old Payperpost.

The only definite solution is to go social. People (as a whole) know what’s good and what isn’t, and will filter out the garbage that’s being linked to from a hundred paid blogs. Jimmy Wales is apparently working on a new social search, but it’s quite a ways off from being relevant. Perhaps Social Q&A will step up to fill the void?