COMMENTS
Hi JM,
Yep the numbers do differ. But, actually the larger number ex:- XXX, 000, 000 doesn't make sense, IIRC. All what makes sense in any search are the hits say within 1st 100 - 1000 or so. When I search I get different counts for the same locations. You may try using a virtual proxy and get even varied results.
Also, you might notice in searches having fewer hits (ex: - 50 - 60), you'll never get to read all of them on Google in most cases. It will say "repeated results" and reduce the list to a fraction.
This is something to do with the way in which search engines report results. The counts are not precise and I don't believe that they could ever count up to that many links in 70 milliseconds or so (a typical response time).
Regards,
Senaka
N.B. I got here through the thread on the CC group.
The count is different at small levels that makes sense. It seems to be that the actual content of the first 10-100 critical results vary as well since the content is distributed across multiple locations without Atomicity, Consistency, Isolation, Durability. Running applications in production mode on google app engine will not work.
Well, I don't think that any transaction's ACID properties are violated. Google uses a Shared Nothing system. But, I don't see a real reason to why results can vary so significantly. Well as you say, it may be true for something very new that wasn't picked by one location. But, for something mature enough the results should'nt vary.
if new items in a shared nothing system aren't identified i.e. credit card payment, purchase order, and instant message the value of the transaction is close to worthless. for search i agree it may not matter for the average user but some folks take search results very seriously, i believe there is a whole SEO business working with it. it would be interesting to see a search engine with the same quality or better result as google with consistency across all query regardless of location.
JM,
Yes I agree. In the case of a search it wont matter. But, if not it would. Because in a search the policy is "best results at best speed" so it is natural to see some compromise in another factor.
You can have SN systems with a sync protocol as well. If someone can find a O(1) algorithm that should do this sync, yes that'd be great :)... It is always a challenge.
Oh btw, heuristic based search result ordering is also another aspect (ex:- page rank).
Regards,
Senaka
That has been successfully accomplished "best results at best speed" at google with consistency compromises.
Give me 10 BlueGene/L's and the search engine algorithm will be accurate. Not to mention the green aspect in energy savings.
http://www.nytimes.com/2008/06/09/technology/09petaflops.html?_r=2&ref=science&oref=slogin&oref=slogin
Well yes, you can argue but still SN has proven to be better... :)...
Well, to me still a SN system fed by accurate n scalable sync policies should still be the way to go.
Anyways, good discussion here. :)
Regards,
Senaka
Anytime Sir, take care.
JM