探索不同域名数据源的一次失败尝试

2021年3月20日 | 分类: 域名经验

《探索不同域名数据源的一次失败尝试》

原文网址: https://uniregistry.com/blog/post/my-failed-attempt-discovering-different-domain-data
原文作者：Jamie Zoch （DotWeekly.com）
发布日期：2021年3月9日
翻译作者：雲將（dotwiki.com）

我们经常听到成功的故事，所以今天我想分享一点不同的！哈，这确认来说真的不是失败，因为我学到了很多东西，但希望您能明白我的意思。

以下是我的思路和尝试，通过探索”不同”来源的域名数据，帮助我找到优质和独特的域名，而不是平平无奇之类。

由于大多数域名投资者都看类似的数据（比如 ExpiredDomains.net），我的目标是为我自己做一些改变，看看我是否可以找到一些不同的来源，让那些不符合热门网站指标而埋没的域名在我眼前闪耀。

我的想法可行，但它需要女巫的原料，使其更好地工作。

研究：

我开始观察热门的用户生成内容的网站。我的目标是找到与人气相关的数据，并关联一个术语。比如“喜欢”和“#”标签。有些站点效果很好，例如LinkedIn，但对于我和我的项目而言，获取数据是一个完全不同的任务，它成为失败的因素。

经过几个星期的网络考察，我决定了3个来源，不仅有我正在寻找的数据类型，也有可以通过爬虫或API获取的数据。

1. InstaGram是最好的。从它可以获得#标记和用户名数据。
2. Twitter其次。
3. 商业目录（这里我不具体指出是哪个网站，它的首字母是M。）

以上三个数据源都不需要“登录”就能获得我需要的数据，并且非常重要。也有其他数据源，但多少有一些因素阻碍了获取数据的过程。

行动：

我从 Godaddy 拍卖频道获得当天过期域名的原始域名列表。我从中筛选出后缀为 .com 并且长度小于 12 个字符的域名。您可以任意设计您的筛选策略。然后我会使用 DomainIQ 对这组域名进行批量 WHOIS 扫描，获得域名年龄这项数据。我将只选取5年或更久的域名。于是我获得了最终的目标域名列表（通常大约2000个域名），这是获得额外数据的基础。我经常使用字典工具来帮助过滤，有帮助，但并不总是完美。

爬取：

我从前没有爬取过网站，我也不知道怎么做。我选择 ParseHub 来负责这个角色，它花了我一点点琢磨的时间和一个月的订阅费用，但工作有效。

在这个过程中，将所有目标域名列表中的域名转换为可爬取的网址。虽然我想出了通过 Excel 和记事本也能做到。当然比较费时！

我的重点是“#”标签，因为InstaGram将显示那些特殊的“#”标签，并标记数字。这是完美的数据，因为“#”标签一般是单词， InstaGram 将把它们聚集起来，并提供真实的统计数字。

域名都向我扑来，都在显示它们的人气！但是我不是在寻找最好的（尽管它们被高亮显示），我在寻找可能被遗漏的域名。举个小栗子，最吸引我的域名类型，所有都是.com： EverydayCounts、KidsFestival、GreatestGifts、GreenCleaner、FreshStore、StoryMakers、TheEffects、 FeedYourHunger等等。

InstaGram的“#”标签数据非常有用。比如那些朗朗上口的营销术语和常规术语。这是很好的，直到它停止！

域名投资需要一个”混合”库存，常规术语是好的，商业/品牌名称也是如此。这是我使用那个以 M字母开头的商业目录的地方。同样，我使用ParseHub，我的重点是搜索URL，显示关于搜索的术语的一个”结果”总数。这将有助于我对搜索的每一个术语进行排名，精华将上升到顶部。

对商业目录的搜索涉及一些更深层次的工作，由我自己和一些域名工具完成。我需要将我的目标域名列表通过拆分关键词（保留空格）转换为“搜索术语”，并使所有域名转换为网址。这是具有挑战性的，但我能够做到这一点。请记住，这些列表有 1500~2000那么长，因此我不可能每天手动操作。

同样，商业目录产生了一些不错的结果，但这次，这些结果更多的是企业名称或商业术语。这也是很好的，直到它停止。

结果？

InstaGram 更改了它的API，开始阻止所有爬取尝试。我再也无法访问数据了。我目前仍然不能也不知道如何得到它，即使付费我也愿意。

我选中的商业目录会持续屏蔽 ParseHub。不值得再试了。我给这家公司发了几封电子邮件，想与他们达成交易，我可以为我需要的数据类型进行付费，但他们从未答复。

总结：

这是一个有趣的实验。InstaGram 的“#”标签数据帮助最大，但同样，它主要高亮显示常见短语和流行术语。两者都经常用于不错的域名！看看这些术语的使用频率，也非常有帮助。

Twitter与 InstaGram 相似，但它是一个不同的网站，“#”标签的用法不完全一样。

来自商业目录的数据非常必要。我真的觉得会有很多这样的商业目录，但我很难找到一个可用的数据源。我觉得我不应该爬取数据，但它似乎真的很难或昂贵，或根本无法直接从公司获得数据。

比如，如果您能够从LinkedIn等热门网站获得这些数据，那将很有帮助。

网络上有现成的数据包可以帮助您以不同的方式寻觅域名。我真的不认为这已经被利用太多。对于更熟悉此类数据以及如何获取此类数据的人，使用起来会很有趣！

我在另一篇文章中提及了这一过程，该文章名为：以不同视角寻觅域名（https://uniregistry.com/blog/post/look-at-domain-names-with-a-different-perspective）。

—

Jamie Zoch [email protected] Jamie Zoch 是 DotWeekly.com 的创始人，该公司提供专业的域名咨询服务，自 2006 年以来一直从事域名行业。Jamie 对域名、域名行业以及帮助他人学习和取得成功充满热情。Jamie 在宝洁公司开始他的职业生涯，但在发现域名后，他的视野更宽阔，对域名的热情也越来越深。

My Failed Attempt at Discovering Different Domain Data

Date: Tuesday, March 9, 2021
URL: https://uniregistry.com/blog/post/my-failed-attempt-discovering-different-domain-data

We often hear success stories, so today I wanted to share the opposite! Ha, it’s really not a failure because I learned a lot, but you get my point.

The following is my thought process and attempts at discovering “different” data sources to help me find good and different domain names than the masses.

Since most domain investors all look at the very same data (think ExpiredDomains.net), my goal was to change that for myself and see if I could find some different sources to make domains shine that didn’t already shine because of metrics on popular sites.

My idea somewhat worked, but it required a witch’s brew of sources to make it work better.

Research:

I started looking at popular, mainly user-generated websites. My goal was to find data related to popularity and tied to a term. Think “likes” and “hashtags”. Some sites would work well, like Linkedin for example, but obtaining the data was a totally different task and the fail point for me and my project.

After a few weeks of looking around the web, I was set on 3 sources that not only had the type of data I was looking for, but also data that was accessible via web scraping or API.

Instagram was the best. Both for Hashtag and Username data.
Twitter was closely behind.
Business Directory (I won’t name the site but it started with an M.)

None of the above required a “login” to obtain the data I was looking for and was important. There were other sources but at least something hindered the process to obtain the data.

How I did it:

I would take a raw list of expired domains for the day at GoDaddy Auctions. I filtered the full list down to .com domains only and reduced the length to 12 characters. You could could filter any way you wish. Then I would do a bulk WHOIS scan via DomainIQ to have domain age with the domains. I would then take only the domains that were 5 years or older. This would be my final list (normally around 2,000 domains) to obtain additional data on. I often used dictionary tools to help filter the lists, which was helpful but not always perfect.

Scraping:

I never scraped a website before and I had no idea how to do it. I picked ParseHub to do this part and it took a little bit of playing around and a paid subscription for a month, but it worked.

It was a process to convert all the domains from my list, into links to be scraped but I figured out ways to do it with Excel and notepad. It all took time, that’s for sure!

My focus was on Hashtags because Instagram would display those specifically and a count with each. This was perfect data because hashtags are words and Instagram would add them up and provide a real number with the terms.

Domains were jumping out at me, all showing popularity right along with them! I wasn’t looking for the best of the best (although they were highlighted), I was looking for domains that may fly under the radar. For a tiny example of what was being highlighted to me, all .com: EverydayCounts, KidsFestival, GreatestGifts, GreenCleaner, FreshStore, StoryMakers, TheEffects, FeedYourHunger and so many more.

Instagram hashtag data was very helpful. Really helpful on catchy type marketing terms and common terms. It was good, until it stopped!

Domain investing requires a “mix” of inventory and the common terms are good but so are business/branding names. This is where I used the business directory that started with an M. Again, I used ParseHub and my focus was on a search term URL that showed a “results” total, for the searched term. This would help me rank each term that I searched and the cream would rise to the top.

The business directory search involved some deeper work by me and more fancy domain related tools. I needed to convert my domain list, into a “search term” with split keywords (keeping the space) and making all that a URL string. It was challenging but I was able to do it. Keep in mind that these lists were 1,500-2,000 long, so there was no way for me to manually do this every day.

Again, the business directory was producing some nice results but this time, the results were more business name or business related terms. This was also good, until it stopped.

What happened?

Instagram changed its API and really started blocking all scraping attempts. I simply couldn’t get access to the data any longer. I currently still can’t and didn’t not know how to get it even by paying for it from them.

The business directory that I was using would constantly block ParseHub. It wasn’t worth trying any more. I sent the company several emails to make a deal that I could pay for the type of data I was looking for but they never replied.

Overall:

It was a fun experiment. The Instagram hashtag data helped the most but again, it mainly highlighted common phrases and popular terms. Both often make for good domains though! Seeing how often these terms were used, was also very helpful.

Twitter was similar to Instagram but it’s a different site and hashtags are used differently on a photo website compared to Twitter.

The business directory was a needed set of data to mix things up a bit. I really felt there would be many of these business directories, but I had a hard time finding one that I could get data from. I didn’t feel I should have to scrape data but it seemed really hard or expensive or simply no way to directly obtain the data from the companies.

If you can obtain this data from popular sites like LinkedIn for example, that would be helpful.

There are datasets around the web that can help you look at domain names differently. I really do not think this has been tapped into much. For people that are more familiar with this kind of data and how to obtain it, it would be interesting to use!

I hinted at this process in a different blog post called: Look At Domain Names with a Different Perspective

—

Jamie Zoch [email protected] Jamie Zoch is the founder of DotWeekly.com, which provides professional domain name consulting services and has been involved in the domain name industry since 2006. Jamie is deeply passionate about domain names, the domain industry and helping others learn and succeed with them. Jamie started his professional career at The Procter & Gamble Company at an early age but after discovering domain names, his eye’s widened and passion for them grew deep.