We index the blogosphere so you don't have to!
The blogosphere publishes - we index, interpret, filter, and cleanse. All you have to do is tune in and listen.
Spinn3r taps into this worldwide conversation. We are positioned directly in the flow of the conversational web, and have been for over three years. Every hour we monitor over 100 thousand of unique content published by more than 20 million weblogs - a publishing pace that accelerates every single day as more individuals publish their unique views and perspectives online.
Ready for Mission Critical Applications
Simple to Use
You can be up and running with Spinn3r in less than an hour. We ship a standard reference client that integrates directly with your pipeline.
If you're running Java, you can get up and running in minutes. If you're using another language, you only need to parse out a few XML files every few seconds.
Real Time Indexing
Spinn3r is tied into the blog ping network provided by Google, Blogger, Ping-o-Matic, WordPress, FeedBurner, and many other content management systems.
When a new blog post is published, we receive direct notification and add this weblog to the top of our queue. A new crawler is then sent out to index the RSS and HTML of this weblog and is immediately published to our customers.
We've developed complex spam prevention technology to prevent spam from being added to our index. Due to the nature of spam, we're never able to completely eliminate the problem, but we're able to minimize it significantly. At any point in time, we believe there's less than 1% spam within Spinn3r.
Ultra Reliable Infrastructure
Spinn3r is hosted in a world class data center. We have over 30 servers and store more than 21TB of content. Every piece of our infrastructure is redundant with additional hardware on standby in case of a failure.
Spinn3r is monitored 24/7 for any potential error in the system. We're so confident in our infrastructure that we back our service with a notch SLA so you can sleep well at night.
The bandwidth costs alone for running a crawler can break the bank. The cost of running Spinn3r on a monthly basis is about 10x what we charge our clients.
We're able to do this because we amortize our costs across multiple customers and only serve up a highly compressed index.
Every post indexed by Spinn3r is classified by language. We compute this from the raw text of the post based on our own proprietary mathematical modeling. It's highly effective. With 200 bytes of content we can predict the correct language with 98% accuracy.
Making Your Job Easier
Indexing All Flavors of RSS
The great thing about standards is that there are so many to choose from.
RSS is no exception. There are more than nine different flavors of RSS and two additional flavors of Atom. All are used in production systems and deployed in the wild.
We believe strongly in the robustness principle. Normal RSS parsers are fragile - not ours. If there are small errors in the source file, we transparently correct them to make sure you get the content you need.
With Spinn3r you only have to deal with one simple API - ours!
Tags have been heavily requested by our customer base and are included directly in our API.
Along with RSS we also index the full HTML of every blog post. We have an additional set of crawlers that index every URL we discover.
Every weblog in our index is assigned a ranking based on number of inbound links from other weblogs in our index. We also have our Social Media Rank which provides more advanced ranking information including authority, actual rank position, etc.
This information is published within our API so you can make informed decisions about the content you index.
The world doesn't revolve around blogs. Mainstream media sites like the NY Times, CNN, and MSNBC also publish a great deal of content on an hourly basis.
Spinn3r indexes over 600 mainstream news sites which we've identified by our proprietary ranking and indexing technology.
A good deal of A-list bloggers use summary RSS which doesn't include the full content of their blog posts.
We're smart enough to adapt. Spinn3r has developed a content extraction technology that is smart enough to pinpoint the actual text on a page while excluding sidebar and navigation content.