Social media content from across the web


Extract critical metadata


Both firehose and search APIs

Social media firehose and search APIs

Every hour we index over 4 million unique posts published by more than 200 million URLs ­ a publishing pace that accelerates every single day as more individuals publish their unique views and perspectives online.

Easy to use API

You can be up and running with Spinn3r in less than an hour. We ship a standard reference client that integrates directly with your pipeline. If you're running Java, you'll be able to start collecting data in minutes. If you're using another language, you only need to parse out a few JSON files every few seconds.

Built on web standards

Built from the ground up to index raw HTML5. This includes HTML metadata including microformats and microdata - which is how Google and other search engines index their content. We don’t stop there. We also index RSS and Atom (including all 9 different RSS variants). Normal RSS parsers are fragile - not ours. If there are small errors in the source file, we transparently correct them to make sure you get the content that you need.

Source discovery

Spinn3r is constantly crawling the web and finding new social media sources. If it publishes in real time, and updates often, you can bet that we index it. Our integrated discovery engine actively patrols the web looking for new high quality content.

Reliable infrastructure

Our infrastructure is state of the art and designed to scale. We’re hosted on ultra-fast SSD drives. We store data in both Cassandra and Elasticsearch and run our entire infrastructure on a horizontally scalable Java crawling infrastructure that we’ve developed over the last 8 years.

We have over 40 servers and store more than 10TB of content. Every piece of our infrastructure is designed with triple redundancy with additional hardware on standby in case of a failure.

Spinn3r is monitored 24/7 for any potential error in the system. We're so confident in our infrastructure that we back our service with a notch SLA so you can sleep well at night.

Mainstream news

The world doesn't revolve only around blogs. Mainstream media sites also publish a great deal of content on an hourly basis. Spinn3r indexes over ten thousand mainstream news sites which we've identified by our proprietary ranking and indexing technology.

A filtered firehose

Our firehose API supports filters with arbitrary boolean logic. We can filter by language, publisher type, domain, etc. This allows us to get you the exact content that you need. No more. No less.

Assign tags to sources

Assign arbitrary tags to your sources then filter and search through these tags in our index. For example, this would allow you to tag specific sources for your customers and then audit each batch of sources individually within our analytics dashboard.

Collect data from any source

Because Spinn3r isn't limited to RSS feeds or APIs we're able to index any arbitrary source that publishes new content. This means we're uniquely positioned to go after content which is difficult or impossible to index for other data providers.