据估计,Google每年通过注册虚假网址的公司和个人获益近5亿美元。
问题的聚焦点是一个名为“误拼域名”的网络蟑螂形式,即通过注册拼错的流行网站域名获利。比如说,操作者可能会注册一个名为“newscientsist.com”的网址,从而误导本想访问“newscientist.com”的网友。
如果newscientsist.com有了足够多的访问用户,那么它的拥有者就可以在这个网站上放广告从而盈利。Google的广告网络可以通过网页内容或者网站站长提供的关键词自动安置广告内容,这使得他们的盈利更增加了可能性。
在这种情况下,Google也会分得一杯羹。哈佛大学教授Tyler Moore和Benjamin Edelman分析了Google可能获得的利润。
误拼
Moore和Edelman通过Alexa.com网站排名确定了3246个最常使用的.com网站,利用最常见的拼写错误,他们确定了一份可能的误拼域名名单。据估计,这些网站平均每个有大约280个误拼域名。
接着为了了解这些误拼域名的获利情况,他们通过相关软件跟进这90多万个网站中的28.5万个。
如果排名前10万的网站的受害情况普遍和Moore和Edelman的研究结果一致,那么这些虚假网站的日访问量就可能超过6800万。他们估计,约有60%的虚假网站使用了Google提供的广告服务。
二位得到的结论是,如果Google从误拼网站获利确如研究结果所示,那么它每年的总收入将有可能达到4.97亿美元。
Google2009年的总收入是230亿美元,其中97%来自广告盈利。
删除广告
Google发言人表示,如果相关持有网站进行投诉,他们会删除相关误拼域名,但是拒绝透露调查细节。
误拼域名误导消费者,并可能给受害网站带来不必要的损失,Moore和Edelman说到。一些公司被迫到误拼网站上发布广告,而自己的网站却正是它们的受害者,可如果不这么做,他们觉得自己会把业务拱手让给竞争对手。
Edelman曾谴责过Google在虚假网站上发布广告的做法。他目前正在为一起对Google索赔的法律案件做顾问——Google将广告投放到某虚假网站,而索赔方正是其受害网站。他表示尽管他涉足了该案件,但这并不会影响自己的研究结果。
对簿公堂
“我不是为了赚钱才来研究这个的,”Edelman解释到。“它的影响太严重了。”
Moore和Edelman表示,他们通过分析发现一些人持有数千个不同类型的误拼域名。他们认为,这意味着Google和其它网络广告服务商有可能识别该类型网站的操纵者。
上个月,在西班牙特纳利夫岛召开的金融密码术与数据安全会议上,More和Edelman陈述了他们的发现。网络上有一份关于该报告的更加详细的分析附件。
作者Jim Giles的twitter:@jimgiles
Measuring Typosquatting — Online Appendix
Tyler Moore & Benjamin Edelman* – Web Appendix to Measuring the Perpetrators and Funders of Typosquatting
Abstract: We describe a method for identifying `typosquatting’, the intentional registration of misspellings of popular website addresses. We estimate that at least 938,000 typosquatting domains that target the top 3,264 .com sites, and we crawl more than 285,000 of these domains to analyze their revenue sources. We find that 80% are supported by pay-per-click ads, often advertising the correctly spelled domain and its competitors. Another 20% include static redirection to other sites. We present an automated technique that uncovered 75 otherwise legitimate websites which benefited from direct links from thousands of misspellings of competing websites. Using regression analysis, we find that websites in categories with higher pay-per-click ad prices face more typosquatting registrations, indicating that ad platforms such as Google AdWords exacerbate typosquatting. However, our investigations also confirm the feasibility of significantly reducing typosquatting. We find that typosquatting is highly concentrated: Of typo domains showing Google ads, 63% use one of five advertising IDs, and some large name servers host typosquatting domains as much as four times as often as the web as a whole.
Paper Contents: Introduction – Structure and Strategy of the Domaining Business – Measuring Typosquatting – How Typosquatting Domains are Used – Do Pay-Per-Click Ads Promote Typosquatting? – Countering Typosquatting – Conclusions
This online appendix lists specific typosquatting domains we found using the search process detailed in Measuring the Perpetrators and Funders of Typosquatting. We built automated systems to classify the revenue sources of each typosquatting domain, as detailed in section 3. In the links that follow, we present specific victims and perpetrators of typosquatting.
Most Popular Websites – This page details selected popular sites that are highly targeted by typosquatting.
Self-Advertising on Typo Domains – Many typosquatting domains display pay-per-click links promoting the same merchants that are targeted by typosquatting. This page lists popular websites suffering high rates of self-advertising on typo domains.
Self-Advertising on Typo Domains – Screenshots – This page presents screenshots of specific typo domains that prominently present ads for the same sites users attempted to visit. This page also notes the ad platform and partner IDs that profit from these typo domains.
Top Targets of Redirects to Competing Domains – Some typosquatting domains redirect to users to competitors’ sites. That is, if a user mistypes one site’s address, the user might end up at a competitor’s service. This listing provides specific examples.
Large Name Servers Resolving Many Typo Domains – We list selected large name servers resolving many typo domains, along with example typo domains and their revenue services. The frequent typosquatting on these large servers indicates that the problem of typosquatting is concentrated on certain hosts.
Small Name Servers Resolving Many Typo Domains – We list selected smaller name servers resolving a high proportion of typo domains, along with example typo domains and how they are used where available. While these servers host fewer domains, they have the highest rates of typosquatting — raising questions of how and why these servers came to host such a high proportion of typosquatting domains.
Large Name Servers Resolving Few Typo Domains – These name servers reflect particular success, by the corresponding server operators, at avoiding typosquatting — raising questions of why other name servers were so much less successful.
Google PPC Ad Client IDs Widely Used in Typosquatting – Google PPC ad partner IDs.
Estimating exposure to typosquatting
Estimating Visitors and Advertising Costs of Typo Domains – Using Alexa data on popularity of popular sites and their typosquatting knock-offs, we estimate the total number of visitors reaching typosquatting domains, and the associated costs to advertisers.
* One of the authors (Edelman) previously served as co-counsel in litigation against Google, arising out of Google’s use of typosquatting domains to display advertising. See Vulcan Golf, LLC, et al. v. Google, et al., N.D.Ill., Case No. 1:20007cv03371.
Posted: February 17, 2010