NAV
Example

Overview

API calls in both JSON and curl are included in this pane.  

Spinn3r provides APIs for social media, weblogs, news, video, and live web content to our customers in any language and in large volumes.

We provide three products main APIs for accessing this content, as well as a number of other secondary APIs.

Search API

Our full-text search API is based on Elasticsearch and provides advanced search facilities on top of a high quality content index.

If you’re getting up and running with Spinn3r for the first time you probably want to be using our Search API.

This API allows you to search for arbitrary text strings, search with complex boolean logic, use filters, and other advanced features like aggregations. The results are then returned as ordinary JSON documents.

Classifier API

The Classifier API allows developers to submit text (or URLs) and provide labels for this content based on our machine learning platform. For example, if you submit a new story about the US Presidential election you would get back labels for the candidates or other topics representing that article.

Parser API

The Parser API provides ad hoc parsing and metadata handling of arbitrary URLs on the web. Additionally, we perform data augmentation of the metadata including gender detection, sentiment detection, etc.

Firehose API

Our Firehose API is designed for bulk access to massive amounts of content. On the order of 200-500GB per day. We support some basic filtering on the content but this would still generate a lot of data for new applications.

If you’re just getting started it might make sense to use the full-text search API and then graduate to the firehose API once you’re indexing more than 10M posts per day.

Support

support@spinn3r.com

If you have any questions on this document, you can send inquiries to support@spinn3r.com or create a ticket by visiting our website.

It’s our goal to get you up and running ASAP and sometimes text documents may not explain all issues.

Authentication

Run a basic full-text search now which sill return a single document using your credentials

curl -XPOST 'http://{{vendor}}.elasticsearch.spinn3r.com/content*/_search' -H "X-vendor: {{vendor}}" -H "X-vendor-auth: {{vendor_auth}}" -d '{
    "size": 1,
    "query" : {
        "term" : { "main" : "firefox" }
    }
}'

Spinn3r uses simple HTTP headers for authentication in all our APIS.

When a new account is created we send your credentials in your account creation email.

Your authentication headers are:

header value
X-vendor {{vendor}}
X-vendor-auth {{vendor_auth}}

If you’re keys aren’t shown please login (or register) to Spinn3r for new keys.

Authentication is performed via HTTP headers provided in the request.

When provisioned you’ll be given a vendor code and authentication code that need to be specified for each request.

Failed authentication

We include a JSON body in the response with a human readable message string on authentication failure:

{
  "success" : false,
  "message" : "Please check your vendor code.  It may be invalid.  Contact support@spinn3r.com if you would like a new one."
}

If HTTP authentication fails we will return either:

Connectivity

High throughput access to Spinn3r is required for your application to be performance.

This applies to our firehose API that can fetch hundreds of gigabytes per day but also to other APIs including search. A single request might be small but we can achieve a 2-3x performance boost by configuring our network properly.

Speed Test

To run the speed test just run:

wget --output-document=/dev/null http://api.artemis.spinn3r.com/speed-test

Outputs:

--2014-09-13 21:44:16--  http://api.artemis.spinn3r.com/speed-test
Resolving api.artemis.spinn3r.com (api.artemis.spinn3r.com)... 108.168.183.21
Connecting to api.artemis.spinn3r.com (api.artemis.spinn3r.com)|108.168.183.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 819200000 (781M)
Saving to: `speed-test'

100%[==========================================================>] 819,200,000 101.5M/s   in 66s     

2014-09-13 21:45:21 (101.9 MB/s) - `speed-test' saved [819200000/819200000]

Having a reliable network connection to our datacenters is critical to solid API performance.

Fortunately, we have a speed test URL that you can use to quickly measure your connection speed without having to worry about the latency of database or API calls.

Required Throughput

In this example, we’re indexing at about 100MB/s … or 800Mbit per second.

When using the firehose API, this will allow you to keep up with data in real time, but catch up quickly should your client fall behind.

For the search API this will allow you to retrieve documents much faster.

Request latency

You can measure the datacenter latency by running the following query which will measure to “TTFB” or time to first byte of the connection:

curl -o /dev/null -w "Connect: %{time_connect} TTFB: %{time_starttransfer} Total time: %{time_total} \n" http://api.artemis.spinn3r.com/speed-test

Request latency depends on your datacenter location.

If you want to improve performance for full-text searches you can use persistent HTTP connections to keep your connections active to the server.

Improving throughput

On Linux you can run the following to increase the TCP buffer size, please cat the files before hand to record the default values.

cat /proc/sys/net/ipv4/tcp_rmem
cat /proc/sys/net/ipv4/tcp_wmem

echo "16384 1048576 33554432" > /proc/sys/net/ipv4/tcp_rmem
echo "16384 1048576 33554432" > /proc/sys/net/ipv4/tcp_wmem

If you’re not receiving optimal throughput, we suggest increasing the OS TCP buffer size.

This is needed if your datacenter is far from our datacenter. If TCP drops a packet, or the reordering of the packets is too high, TCP has to slow down or data corruption will result.

Content

{
  "bucket" : 0,
  "resource" : "http://cnn.com/2014/10/15/health/texas-ebola-outbreak",
  "date_found" : "2014-06-22T01:08:52Z",
  "index_method" : "PERMALINK_TASK",
  "html" : "<html><body><p>Full HTML of the content</p></body></html>",
  "html_length" : 57,
  "source_hashcode" : "COH0cFU4G1sMlRHd9gEvS-n3FFI",
  "source_resource" : "http://cnn.com",
  "source_link" : "http://cnn.com",
  "source_publisher_type" : "MAINSTREAM_NEWS",
  "source_publisher_subtype" : "MAINSTREAM_NEWS",
  "source_date_found" : "2014-06-22T01:08:52Z",
  "source_update_interval" : 900000,
  "source_setting_update_strategy" : "CYCLICAL",
  "source_setting_index_strategy" : "DEFAULT",
  "source_title" : "CNN",
  "permalink" : "http://www.cnn.com/2014/10/15/health/texas-ebola-outbreak/index.html",
  "canonical" : "http://www.cnn.com/2014/10/15/health/texas-ebola-outbreak/index.html",
  "main" : "<p>Full HTML of the content</p>",
  "main_length" : 31,
  "main_checksum" : "I7QyvW_g9AjGg3vWjmcxwo7wjXs",
  "main_format" : "HTML",
  "summary_text" : "Another nurse who contracted Ebola after caring for a man who died of the virus was on a flight from Cleveland to Dallas.",
  "title" : "CDC: Nurse with Ebola should not have traveled",
  "publisher" : "CNN",
  "section" : "Technology",
  "description" : "Another nurse who contracted Ebola after caring for a man who died of the virus was on a flight from Cleveland to Dallas.",
  "tags" : [ "nurse", "outbreak", "ebola" ],
  "published" : "2014-06-22T01:08:52Z",
  "author_name" : "Holly Yan",
  "author_link" : "https://twitter.com/HollyYanCNN",
  "lang" : "en"
} 

Nearly all Spinn3r APIs strive to produce the same schema and core set of fields. This includes the search, firehose, and parser APIs.

Core fields

The schema returned by Spinn3r has a large number of fields produced by our indexing system.

This ranges from basic fields like title, link, description, and article body, to author information, and all the way to NLP analysis including near duplicates, gender, language, etc.

You may want to review our schema for the full list of fields.

Basic fields

All posts will have a permalink. Most will have a title except for MICROBLOG and some PHOTO publisher types.

If a summary_text is available you may wish to use this or displaying content along with optional extract or main content.

You may want to review our content cards for how to display content in a UI.

Content Cards

Here are some basic examples for displaying content. How this is accomplished in practice is entirely up to your web designer.

SUMMARY:

summary card

SUMMARY_LARGE_IMAGE

summary card with large image

PLAYER

player card

Spinn3r includes a number of text fields that include the full post of the content. This can be used to build out full-text search applications or build NLP models for classification of content.

However, It’s not always ideal for presenting to users.

  1. The full HTML is generally too long for displaying more than 1 or 2 posts within a web application.

  2. It’s unclear where to start displaying content. Usually one would want to build a summary of the content but this is easier said than done (there’s a whole branch of machine learning dedicated to document summarization)

Content cards can help with this issue.

Spinn3r supports ‘cards’ which allow you to present content to your user in a rich format. We properly handle extracting image , video, and text metadata so you don’t have to.

Fields

card

The card field specifies how you can include content within your web application.

We support the following types of cards:

name description
SUMMARY Basic summary of the content with optional image
SUMMARY_LARGE_IMAGE Basic summary of the content using a large image
PHOTO The content is a photo and is the primary content
GALLERY The content is a photo gallery with multiple images
PLAYER The content is an embedded video player

Main fields

The title, permalink, description fields provide the main content for a card. These will provide the core backbone of your content. When the card field is is set it can be assumed that the title and description will ALWAYS be available for displaying to the user.

Image

When set, this is an image representing the content. It can optionally have an image_width and image_height.

Player

When set, this is a video URL designed to be used within an iframe. This only applies when the card is set to PLAYER

Shared Content

Many social networks and blog publishing platforms support the concept of 'shared content’ whereby a user can easily share a piece content with their followers if they deem it noteworthy.

Spinn3r supports this by flagging content with a shared field which is true when the content is a piece of shared content.

Additionally, we add a few more fields including:

shared_author_link
shared_author_name
shared_author_user_id
shared_identifier
shared_permalink
shared_author_handle

See the documentation for the content schema for these fields.

Note that if you would like to use our search API to find nested/shared content this is easily accomplished by taking an identifier and then search for the shared_identifier to fetch everyone who has shared an article.

Additionally. no like or share values are present on shared content as this content can’t itself be liked or shared.

Search

Here’s an example of searching for the term ‘Obama’ using a query string.
We provide this is a raw curl command for ease of use. The examples after this will use JSON. You can simply use this curl command as a template to execute the examples individually.

curl -XPOST 'http://{{vendor}}.elasticsearch.spinn3r.com/content_*/_search?pretty=true' \
     -H "X-vendor: {{vendor}}" \
     -H "X-vendor-auth: {{vendor_auth}}" \
     -d '{
          "size": 1,
            "query": {
                "query_string" : {
                    "query" : "main:Obama"
                }
            }  
        }
        '

JSON result for this query. You can see the full schema for the resulting JSON in our content schema.

{
  "bucket" : 1459618800097,
  "sequence" : 1459618891000015420,
  "sequence_range" : 9297,
  "hashcode" : "cry8h2SbQBlpllWRnDo1NBwgBJE",
  "resource" : "http://politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545",
  "date_found" : "2016-04-02T17:41:31Z",
  "index_method" : "PERMALINK_TASK",
  "detection_method" : "SOURCE",
  "version" : "5.1.684",
  "source_hashcode" : "bFiShGib138jwIeU8ryJQWH7CFA",
  "source_resource" : "http://politico.com",
  "source_link" : "http://www.politico.com/",
  "source_publisher_type" : "MAINSTREAM_NEWS",
  "source_date_found" : "2016-03-13T04:44:59Z",
  "source_last_updated" : "2016-04-02T17:26:28Z",
  "source_last_published" : "2016-04-02T17:41:28Z",
  "source_last_posted" : "2016-04-02T17:11:28Z",
  "source_update_interval" : 900000,
  "source_http_status" : 200,
  "source_content_length" : 244718,
  "source_content_checksum" : "cQrtB2o9SVvDK1TACcGvSDKQ8tw",
  "source_assigned_tags" : [ "#7ZdROb4JaZaz4uGuG_9csN3QDlg" ],
  "source_setting_update_strategy" : "CYCLICAL",
  "source_setting_index_strategy" : "DEFAULT",
  "source_title" : "@politico",
  "source_description" : "POLITICO covers political news with a focus on national politics, Congress, Capitol Hill, lobbying, advocacy, and more.  POLITICO's in-depth coverage includes video features, regular blogs, photo galleries, cartoons, and political forums.",
  "source_feed_href" : "http://www.politico.com/rss/politicopicks.xml",
  "source_feed_title" : "POLITICO - TOP Stories",
  "source_feed_format" : "RSS",
  "permalink" : "http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545",
  "permalink_redirect" : "http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545",
  "permalink_redirect_domain" : "politico.com",
  "permalink_redirect_site" : "politico.com",
  "canonical" : "http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545",
  "domain" : "politico.com",
  "site" : "politico.com",
  "main" : "<div> \n <div> \n  <div> \n   <section> \n    <div> \n     <div> \n     </div> \n    </div> \n   </section> \n  </div> \n </div> \n</div> \n<div> \n <div> \n  <div> \n   <section> \n    <section> \n     <div> \n      <ul> \n       <li> \n        <article>  \n         <div> \n          <a href=\"http://www.politico.com/story/2016/03/ted-cruz-christian-evangelical-vote-221349\"><img alt=\"160329_ted_cruz_ap_1160.jpg\" title=\"160329_ted_cruz_ap_1160.jpg\" /></a>\n         </div>  \n         <div>  \n          <h3> <a href=\"http://www.politico.com/story/2016/03/ted-cruz-christian-evangelical-vote-221349\">Ted Cruz’s evangelical problem </a></h3>  \n         </div> \n        </article> </li> \n       <li> \n        <article>  \n         <div> \n          <a href=\"http://www.politico.com/story/2016/04/donald-trump-delegates-north-dakota-gop-221480\"><img alt=\"160330_donald_trump_gty_1160.jpg\" title=\"160330_donald_trump_gty_1160.jpg\" /></a>\n         </div>  \n         <div>  \n          <h3> <a href=\"http://www.politico.com/story/2016/04/donald-trump-delegates-north-dakota-gop-221480\">How the North Dakota GOP is freezing out Trump</a></h3>  \n         </div> \n        </article> </li> \n       <li> \n        <article>  \n         <div> \n          <a href=\"http://www.politico.com/magazine/story/2016/03/donald-trump-2016-terrorist-attack-foreign-policy-213784\"><img alt=\"160401.jpg\" title=\"160401.jpg\" /></a>\n         </div>  \n         <div>  \n          <h3> <a href=\"http://www.politico.com/magazine/story/2016/03/donald-trump-2016-terrorist-attack-foreign-policy-213784\">9/11: What Would Trump Do? </a></h3>  \n         </div> \n        </article> </li> \n       <li> \n        <article>  \n         <div> \n          <a href=\"http://www.politico.com/story/2016/04/obama-donald-trump-presser-221486\"><img alt=\"GettyImages-518602178.jpg\" title=\"GettyImages-518602178.jpg\" /></a>\n         </div>  \n         <div>  \n          <h3> <a href=\"http://www.politico.com/story/2016/04/obama-donald-trump-presser-221486\">Obama goes radioactive on Trump</a></h3>  \n         </div> \n        </article> </li> \n       <li> \n        <div> \n        </div> </li> \n       <li> \n        <article>  \n         <div> \n          <a href=\"http://www.politico.com/story/2016/04/hillary-clinton-bernie-sanders-attacks-221484\"><img alt=\"160401_hillary_clinton_ap_1160.jpg\" title=\"160401_hillary_clinton_ap_1160.jpg\" /></a>\n         </div>  \n         <div>  \n          <h3> <a href=\"http://www.politico.com/story/2016/04/hillary-clinton-bernie-sanders-attacks-221484\">Sanders gets under Clinton's skin in New York</a></h3>  \n         </div> \n        </article> </li> \n      </ul> \n     </div> \n    </section> \n   </section> \n  </div> \n </div> \n</div> \n<div> \n <div> \n  <div> \n   <section> \n    <div> \n     <div> \n      <div>\n       <div> \n        <div> \n         <a href=\"http://www.politico.com/playbook\"><img alt=\"Playbook\" /></a>\n        </div> \n        <div> \n         <div>  \n          <p> Politico</p> \n          <h2> <a href=\"http://www.politico.com/playbook\">Playbook</a></h2> \n          <p>Mike Allen's must-read briefing on what's driving the day in Washington </p>  \n         </div> \n         <div>     \n          <b></b> Subscribe   \n         </div> \n        </div> \n       </div> \n      </div>\n     </div> \n    </div> \n   </section> \n  </div> \n </div> \n</div> \n<div> \n <div> \n  <article> \n   <section> \n    <div> \n     <div> \n      <div> \n       <aside> \n        <ul> \n         <li>  Shares </li> \n         <li> <a href=\"http://api.addthis.com/oexchange/0.8/forward/facebook/offer?pco=tbx32nj-1.0&amp;url=http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545&amp;pubid=politico.com\"> <b></b> Facebook </a> </li> \n         <li> <a href=\"http://api.addthis.com/oexchange/0.8/forward/twitter/offer?pco=tbx32nj-1.0&amp;url=http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545&amp;pubid=politico.com&amp;text=BERNIE+GETS+UNDER+HILLARY%E2%80%99S+SKIN+--+MANAFOT+RISING%3A+Trump%E2%80%99s+new+guru+builds+empire%2C+including+veterans+of+GOP%E2%80%99s+last+contested+convention+%E2%80%93+B%E2%80%99DAYS%3A+Brent+Colburn%2C+Meridith+Webster\"> <b></b> Twitter </a> </li> \n         <li> <a href=\"http://api.addthis.com/oexchange/0.8/forward/googleplus/offer?pco=tbxnj-1.0&amp;url=http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545&amp;pubid=politico.com&amp;title=BERNIE+GETS+UNDER+HILLARY%E2%80%99S+SKIN+--+MANAFOT+RISING%3A+Trump%E2%80%99s+new+guru+builds+empire%2C+including+veterans+of+GOP%E2%80%99s+last+contested+convention+%E2%80%93+B%E2%80%99DAYS%3A+Brent+Colburn%2C+Meridith+Webster\"> <b></b> Google + </a> </li> \n        </ul> \n        <ul> \n         <li> <a href=\"mailto:?subject=POLITICO Playbook, presented by the Embassy of the United Arab Emirates: BERNIE GETS UNDER HILLARY’S SKIN -- MANAFOT RISING: Trump’s new guru builds empire, including veterans of GOP’s last contested convention – B’DAYS: Brent Colburn, Meridith Webster&amp;body=http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545\"> <b></b> Email </a> </li> \n         <li> <a href=\"http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545#superComments\"> <b></b> Comment </a> </li> \n         <li> <a href=\"http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545#\"> <b></b> Print </a> </li> \n        </ul>\n       </aside> \n      </div> \n     </div> \n    </div> \n   </section> \n   <section> \n    <div> \n     <div> \n      <div></div> \n      <div> \n       <div>  \n        <h1>BERNIE GETS UNDER HILLARY’S SKIN -- MANAFOT RISING: Trump’s new guru builds empire, including veterans of GOP’s last contested convention – B’DAYS: Brent Colburn, Meridith Webster</h1>   \n        <p>04/02/16 01:32 PM EDT</p>  \n       </div> \n      </div> \n      <p><b>By Mike Allen </b>(@mikeallen; <a href=\"mailto:mallen@politico.com\">mallen@politico.com</a>)<b> and Daniel Lippman </b>(@dlippman; <a href=\"mailto:dlippman@politico.com\">dlippman@politico.com</a>)</p> \n      <p><b>Happy Saturday! </b>Obama alumnus Brent Colburn asked that we include this in celebration of his birthday: Today “is World Autism Awareness Day, and in celebration of my nephew Cordis and his amazing parents, Brian &amp; Andrea Colburn, I’d like to encourage every Playbooker to take a minute and learn more about ... the autism community.” <a href=\"http://www.autismspeaks.org\">www.autismspeaks.org</a></p>\n      <p>Story Continued Below</p> \n      <div> \n       <div> \n        <div> \n        </div> \n       </div> \n      </div> \n      <p><b>INSIDE THE CAMPAIGNS – “Trump campaign shrinks Lewandowski’s role: </b>Despite the billionaire’s staunch defense, his embattled campaign manager is losing clout,” by Ben Schreckinger and Ken Vogel, with Hadas Gold: “Trump’s just-named convention manager, Paul Manafort, is expected to take a leading role not just in the selection of delegates, but in the remaining primaries themselves. ... [A] person involved in Trump’s campaign [said:] ... ‘Mr. Trump’s listening to other people now. The crew’s expanding.’ ... </p> \n      <p><b>“[T]his winter,</b> ... National Political Director Michael Glassner ... was [promoted] to deputy campaign manager ... On March 2, the campaign promoted Stuart Jolly ... to national field director, giving him primary authority over ... hiring ... field staff. ... <b>Manafort has quickly taken charge of his own fiefdom in Washington, and is planning to hire a team of his own,</b> which is likely to include several veterans of the 1976 Republican National Convention – the party’s last convention at which the presidential nomination was contested.” <a href=\"http://politi.co/22YORlU\">http://politi.co/22YORlU</a></p> \n      <p><b>**SUBSCRIBE to Playbook</b>: <a href=\"http://politi.co/1M75UbX\">http://politi.co/1M75UbX</a></p> \n      <p><b>CHASER – “Trump touts his loyalty in defending campaign manager</b>,” by AP’s Jill Colvin in Appleton, Wis.: “Trump [said in a phoner Thu. evening that] his decision to stand behind his campaign manager ... is a sign of loyalty — a trait that Trump has displayed, for better or worse, through much of his career.” <a href=\"http://apne.ws/1RS7xLX\">http://apne.ws/1RS7xLX</a> </p> \n      <p><b>ALEX BURNS, who turns 3-0 tomorrow, </b>coins a memorable phrase on N.Y. Times p. A9, “G.O.P. Fears Trump as <b>Zombie Candidate: Damaged but Unstoppable”:</b> “Republicans who once worried that Mr. Trump might gain overwhelming momentum ... are now becoming preoccupied with a different grim prospect: that Mr. Trump might become a kind of zombie candidate — damaged beyond the point of repair, but too late for any of his rivals to stop him.” <a href=\"http://nyti.ms/1ZTJn6E\">http://nyti.ms/1ZTJn6E</a></p> \n      <p><b>VICE PRESIDENT BIDEN </b>and Dr. Jill Biden will be at tonight’s NCAA Final Four semifinal games in Houston to promote the It’s On Us campaign to end sexual assault on campus. The two will appear for a pregame interview on TBS. They return to D.C. on Sunday.</p> \n      <p><b>--Other Washingtonians at the Final Four:</b> Jonathan Martin, who’s a birthday boy tomorrow, and Betsy Fischer Martin; John Feinstein; and Danielle and Jeff Jones.</p> \n      <p><b>AP for SUNDAY PAPERS – “Clinton’s frustration grows, as primary race drags on,”</b> by Lisa Lerer in Syracuse and Ken Thomas in N.Y.: “Hillary Clinton snapped at a Greenpeace protester. She linked Bernie Sanders and tea party Republicans. And she bristled with anger when nearly two dozen Sanders supporters marched out of an event near her home outside New York City ... After a year of campaigning, months of debates and 35 primary elections, Sanders is finally getting under Clinton’s skin ... Clinton has spent weeks largely ignoring Sanders and trying to focus on ... Trump. Now, after several primary losses and with a tough fight in New York on the horizon, Clinton is showing flashes of frustration with the Vermont senator ...</p> \n      <p><b>“According to Democrats close to Hillary</b> and former President Bill Clinton, both are frustrated by Sanders' ability to cast himself as above politics-as-usual even while firing off what they consider to be misleading attacks. The Clintons are even more annoyed that Sanders' approach seems to be rallying ... young voters by his side. While Hillary Clinton's team contends her lock on the nomination as ‘nearly insurmountable,’ the campaign frequently grumbles that Sanders hasn't faced the same level of scrutiny ... Her aides complain about Sanders' rhetoric, claiming he's broken his pledge to avoid character attacks ... </p> \n      <p><b>“Clinton hopes that big victories</b> in New York on April 19 and five Northeastern states a week later will allow her to wrap up the nomination by the end of the month. But aides acknowledge that Sanders ... is unlikely to feel significant political or financial pressure to drop out of the race, even if it becomes clear he cannot win ... Sanders must win 67 percent of the remaining delegates and uncommitted superdelegates ... through June to be able to clinch the Democratic nomination. So far he's only winning 37 percent. </p> \n      <p><b>“Joel Benenson,</b> Clinton’s chief strategist, said: ‘We’re going to get to a point at the end of April where there just isn’t enough real estate for him to overcome the lead that we’ve built.’ Still, any kind of truce is probably weeks, if not months, away. ... Sanders is costing Clinton significant time, money and political capital [and] is drawing sizable crowds in New York.” <a href=\"http://apne.ws/1pUb1Uo\">http://apne.ws/1pUb1Uo</a></p> \n      <p><b>FRIENDS NOW CALLING SPICER “Mr. Chairman” ... “Backstage maneuvering begins in wide-open GOP chairman’s race,” by The Hill’s Scott Wong:</b> “Two ... RNC senior officials also have been mentioned as potential Priebus successors: John Ryder, the RNC’s general counsel, and Sean Spicer, the RNC’s chief strategist and communications director.<b> </b>... [and] the RNC’s top communicator since 2011.” <a href=\"http://bit.ly/1UyiRAk\">http://bit.ly/1UyiRAk</a> <br /> <b>--@seanspicer</b>: “Sunday add @GOP’s @Reince to list of ppl that have done ‘full Ginsburg’ @ThisWeekABC @meetthepress @FoxNewsSunday @FaceTheNation @CNNSotu”</p> \n      <p><b>--TELLY LOVELACE</b> named RNC’s national director of African American initiatives and media -- Release: “Telly joins the RNC from IR+ Media ... where he served as Managing Director. Previously, Telly served as a senior member of Maryland Governor Larry Hogan’s communications team.”</p> \n      <p><b>** A message from the Embassy of the United Arab Emirates:</b> The UAE stands with the US and President Obama in a shared commitment to stopping the proliferation of nuclear weapons. This is one of many areas where the UAE and US work together to strengthen stability and security in the Mideast and around the world. Learn more: <a href=\"http://bit.ly/1WFzIyE\">bit.ly/1WFzIyE</a> **</p> \n      <p><b>PIC DU JOUR: </b>Colleagues of Jen Friedman in the White House press office played an April Fool’s joke on her yesterday by putting her birthday in Playbook. That led to dozens of happy birthday emails to her, including from senior staff, even though the deputy press secretary's real birthday is Nov. 7. They also decorated her office with a “Happy Birthday” banner and a balloon. <b><i>Pic of her decorated desk <a href=\"http://bit.ly/21XVFdO\">http://bit.ly/21XVFdO</a></i></b></p> \n      <p><b>LIFE ONLINE – “Snapchat’s Ultimate Goal Isn’t Just Chat—It’s Total Media Domination,” by Fortune’s Mathew Ingram:</b> “The latest iteration came this week with the addition of new features including video calling, audio and video messaging, GIFs, and stickers. Unlike a lot of other messaging apps, all of the new features are blended together—users can seamlessly toggle between video and audio, send short notes, and draw on top of shared photos.” <a href=\"http://for.tn/1VjJ7gB\">http://for.tn/1VjJ7gB</a> </p> \n      <p><b>TOMORROW’S TIMES TODAY -- “Navy SEALs Split Over Members’ Benefiting From Hard-Earned Brand,” by Nicholas Kulish, Christopher Drew, and Sean D. Naylor: </b>“[F]ormer members ... are increasingly giving paid speeches, sounding off on politics on Fox News and stamping the force’s name on hats, backpacks, vitamins ... [A] half-dozen books are scheduled to roll off the presses in coming months, adding to the 100-plus published by former SEALs since 2001.<b> </b>... Far more SEALs have gone public than their more reticent Army counterparts in Delta Force and the Rangers ...</p> \n      <p><b>“One author, Matt Bissonnette, </b>earned millions for ‘No Easy Day,’ a firsthand narrative of the Bin Laden raid, but had to forfeit the profits for failing to submit it for Pentagon review of classified information.” <a href=\"http://nyti.ms/1RS6BqQ\">http://nyti.ms/1RS6BqQ</a></p> \n      <p><b>WHITE HOUSE DEPARTURE LOUNGE:</b> Noah Schwartz left the National Security Council on Friday where he was advisor to the deputy national security advisor for int’l econ; he’s headed to the Office of the Secretary of Defense, where he’ll be working on South and Southeast Asia policy. He emails friends: “It has been a great privilege to serve on the NSC staff these past few years, and I will always be grateful for the experience. Special thanks to everyone (past and present) on the international economics team.”</p> \n      <p><b>POLITICO MAGAZINE FRIDAY COVER – “9/11: What Would Trump Do?”:</b> “Politico Magazine asked foreign policy and counterterrorism experts, historians, Trump biographers, even psychologists to take a serious guess at how he’d handle the days after a terrorist attack in the United States—all based on what they know about Trump the candidate and what he’ll be facing if he gets elected.” <i>With hot takes from Jacob Heilbrunn, Ian Bremmer, Amb. Dennis Ross, Aaron David Miller, Andrew Bacevich and more</i> <a href=\"http://politi.co/1UJ0n0k\">http://politi.co/1UJ0n0k</a> </p> \n      <p><b>STUFF TRUMP SAYS -- “How Donald Trump sees himself,” by CNN’s Scott Glover and Maeve Reston with a video by Brenna Williams:</b> “He considers himself a member of ‘the lucky sperm club.’<b> </b>He trusts no one, and places a premium on revenge. (‘If you do not get even, you are just a schmuck!’)<b> </b>He treats every decision he makes ‘like a lover,’ sometimes thinking with his head, other times with other parts of his body, because it reminds him to ‘keep in touch with my basic impulses.’<b> </b>And to make creative choices, he writes: ‘I try to step back and remember my first shallow reaction. The day I realized it can be smart to be shallow was, for me, a deep experience.’” <a href=\"http://cnn.it/1SGvhCL\">http://cnn.it/1SGvhCL</a> ... <i><b>2-min. video</b></i> <a href=\"http://cnn.it/1SsWF4w\">http://cnn.it/1SsWF4w</a> </p> \n      <p><b>SUBJECT LINE DU JOUR – </b>Trump’s menacing campaign email sent Friday:<b> </b>“We’re Coming For You Wisconsin!” <i><b>Text of his email</b></i> <a href=\"http://bit.ly/25D0qOL\">http://bit.ly/25D0qOL</a> </p> \n      <p><b>VIDEO DU JOUR – “Donald Trump’s Love/Hate Relationship With Women”</b> – Politico Magazine – <i><b>2-min. video </b></i><a href=\"http://politi.co/25Bwouu\">http://politi.co/25Bwouu</a><i><b> </b></i></p> \n      <p><b>CLICKERS – “The nation’s cartoonists on the week in politics,” edited by Matt Wuerker – </b><i>11 keepers</i> <a href=\"http://politi.co/22WoUUd\">http://politi.co/22WoUUd</a> ... <i><b>Matt’s thirteen March cartoons</b></i> <a href=\"http://politi.co/1RRWsxo\">http://politi.co/1RRWsxo</a> </p> \n      <p><b>GREAT WEEKEND READS,</b> curated by Daniel Lippman:</p> \n      <p>--“<b>Jesus of Nazareth, Whose Messianic Message Captivated Thousands, Dies at About 33,” by Sam Roberts in Vanity Fair: </b>“Roberts, an obituary writer for The New York Times, imagines how, given the facts available then, his predecessors might have reported the aftermath of an execution in the Middle East one Friday two millennia ago.” <a href=\"http://bit.ly/1UZby4C\">http://bit.ly/1UZby4C</a> </p> \n      <p>--<b>“The Men Who Gave Trump His Brutal Worldview,” by Michael D’Antonio, </b>author of “Never Enough: Donald Trump and the Pursuit of Success,” on Politico Magazine: “Tutored by his fiercely ambitious father and tough-as-nails high school coach, the GOP frontrunner has only one ethical code: life is combat.” <a href=\"http://politi.co/1LYiY5C\">http://politi.co/1LYiY5C</a><b><i> </i></b><i>...<b> $15.16 on Amazon</b></i> <a href=\"http://amzn.to/1g8Rlak\">http://amzn.to/1g8Rlak</a> </p> \n      <p><b>--“This Professor Knows Why You Hate Ted Cruz’s Face” --Washingtonian Staff</b>: “(He read the other candidates’ faces, too.)” <a href=\"http://bit.ly/1M1RDiU\">http://bit.ly/1M1RDiU</a></p> \n      <p><b>--“Emma Smith on The Best Plays of Shakespeare” – interviewed by Beatrice Wilford on FiveBooks.com:</b> “In the first of a series marking the 400th year since the playwright’s death, we ask Shakespearean scholar Emma Smith to pick her five favourite plays.” <a href=\"http://bit.ly/1MHUeyx\">http://bit.ly/1MHUeyx</a></p> \n      <p><b>--“Murder in Mayfair,” by Peter Pomerantsev in The London Review of Books,</b> reviewing “A Very Expensive Poison: The Definitive Story of the Murder of Litvinenko and Russia’s War with the West,” by Luke Harding: “As he lay dying Alexander Litvinenko ... found it increasingly hard to open his mouth to talk, as he became yellow and shrivelled, he cursed himself for letting his guard down: he had assumed he was safe after receiving asylum and citizenship in the UK.” <a href=\"http://bit.ly/1MHUqhg\">http://bit.ly/1MHUqhg</a> ... <i><b>$12.93 on Amazon</b></i> <a href=\"http://amzn.to/1RS7lfM\">http://amzn.to/1RS7lfM</a> (h/t TheBrowser.com)</p> \n      <p><b>--“The Strange Case of a Nazi Who Became an Israeli Hitman,” by Forward’s Dan Raviv and Yossi Melman in Haaretz:</b> “Otto Skorzeny, one of the Mossad’s most valuable assets, was a former lieutenant colonel in Nazi Germany’s Waffen-SS and one of Adolf Hitler’s favorites.” <a href=\"http://bit.ly/1RRUhds\">http://bit.ly/1RRUhds</a> </p> \n      <p><b>--“How to Hack an Election,” by Jordan Robertson, Michael Riley, and Andrew Willis on Bloomberg Businessweek’s</b> international cover: “Andr&eacute;s Sep&uacute;lveda rigged elections throughout Latin America for almost a decade. He tells his story for the first time.” <a href=\"http://bloom.bg/1RRUqxm\">http://bloom.bg/1RRUqxm</a> ... <i><b>The cover</b></i> <a href=\"http://bit.ly/1qkEUOn\">http://bit.ly/1qkEUOn</a></p> \n      <p><b>--“How Meryl Streep Battled Dustin Hoffman, Retooled Her Role, and Won Her First Oscar,” by Michael Schulman </b>on the cover of April’s Vanity Fair, in an adaptation of his upcoming biography “Her Again: Becoming Meryl Streep” (out April 26):<b> </b>“At 29, Meryl Streep was grieving for a dead lover, falling for her future husband, and starting work on Kramer vs. Kramer, the movie that would make her a star and sweep the 1980 Oscars. ... Schulman recounts the struggles—physical, emotional, and intellectual—that launched Streep’s legend.” <a href=\"http://bit.ly/1RRUw8l\">http://bit.ly/1RRUw8l</a> ... <i><b>The cover </b></i><a href=\"http://bit.ly/1RuAL4x\">http://bit.ly/1RuAL4x</a> ... <i><b>$20.35 pre-order on Amazon </b></i><a href=\"http://amzn.to/1pUgx9P\">http://amzn.to/1pUgx9P</a> (h/t Longform.org)</p> \n      <p><b>--“#Jihad: Why ISIS is winning the social media war,” by Brendan I. Koerner in Wired: </b>“The group’s closest peers are not just other terrorist organizations, then, but also the Western brands, marketing firms, and publishing outfits—from PepsiCo to BuzzFeed—who ply the Internet with memes and messages in the hopes of connecting with customers.” <a href=\"http://bit.ly/1Y4E0jF\">http://bit.ly/1Y4E0jF</a> ... <i><b>Video of</b></i> <i><b>Koerner on &quot;CBS This Morning: Saturday” </b></i> <a href=\"http://cbsn.ws/1Y6qImE\">http://cbsn.ws/1Y6qImE</a> </p> \n      <p><b>--“The Longform Guide to the Dark Side of Hollywood”: </b>8 pieces on “Corruption, venality, and tragedy: a collection of picks on what lies beneath the glitter.” <a href=\"http://bit.ly/1M7kPVN\">http://bit.ly/1M7kPVN</a><b> </b></p> \n      <p><b>--“Crowd Source,” by Davy Rothbart in California Sunday Magazine</b>: “Inside the company that provides fake paparazzi, pretend campaign supporters, and counterfeit protesters.” <a href=\"http://bit.ly/1SGsWYC\">http://bit.ly/1SGsWYC</a> (h/t Longreads.com)</p> \n      <p><b>MEDIAWATCH – “Andrew Sullivan joins New York Magazine,” by Hadas Gold:</b> “Sullivan is joining New York Magazine [as] a contributing editor, the publication’s editor-in-chief Adam Moss announced on Friday. Sullivan ... will write features throughout the year and cover politics, including the 2016 Democratic and Republican National Conventions. In a note on Facebook, Sullivan said his first piece is on Donald Trump. ... Sullivan’s standalone blog, The Dish, stopped publishing in February 2015, with Sullivan citing financial and personal difficulties. ... Prior to his career as a blogger, Sullivan was editor of The New Republic and a writer for New York Times Magazine.” <a href=\"http://politi.co/1RRWYvm\">http://politi.co/1RRWYvm</a> ... <i><b>His letter to “Dishheads” </b></i><a href=\"http://bit.ly/1Y4FDh8\">http://bit.ly/1Y4FDh8</a><i><b> </b></i></p> \n      <p><b>FINAL FOUR --<i> </i>“‘One Shining Moment’ gets team-specific twist with Ne-Yo” –AP/Houston</b>: “‘One Shining Moment’ is getting a new Grammy-winning voice and some team-specific highlights at the end of the NCAA Final Four. The song has been the backdrop for the highlight piece to wrap up NCAA Tournaments for three decades, and the version by the late Luther Vandross will continue to be used for the national broadcast on TBS. For TNT and truTV, a rendition by three-time Grammy Award winner Ne-Yo will be used after the first-ever team-specific broadcasts of the national championship game. Ne-Yo’s performance will accompany team-centric highlights of the schools being featured in the Team Stream presentations, following their quest leading up to and during the title game.” <a href=\"http://yhoo.it/1qb6FsM\">http://yhoo.it/1qb6FsM</a> <i>...<b> Last year’s NCAA highlight piece</b></i> <a href=\"http://bit.ly/21ZpB9r\">http://bit.ly/21ZpB9r</a> ... <i><b>Charles Barkley’s 30-second rendition </b></i><a href=\"http://bit.ly/1pUoSKC\">http://bit.ly/1pUoSKC</a> </p> \n      <p><b>BIRTHWEEK (was yesterday):</b> Noah Schwartz … Susan Pisano</p> \n      <p><b>BIRTHDAYS:</b> Dr. Jud Feldman (hat tip: MBF) ... Brent Colburn ... Meridith Webster, director of comms. and public affairs at Bloomberg and an Obama West Wing alum -- fun fact: Favorite color is orange! (h/ts Ben Chang) ... Politico’s Dana Rubinstein and Josefa Velasquez … Lynda Tran, 270 Strategies founding partner and CBS News political contributor (h/t Heather Purcell and Eleis Brennan) ... Brian Austin, consultant at Kaiser Associates (h/t wife Emily Stephenson) ... Emily Steel, a TV and media reporter at the NYT and a WSJ and FT alum ... Carl Kasell, formerly of NPR (whose voice is on HIS answering machine?) … Robby Zirkelbach, SVP of comms at PhRMA and AHIP alum, mediocre golfer, avid Hawkeyes fan and great friend (h/t Joe Brettell) ... Dan Sallick, partner and co-founder of Subject Matter (h/t Peter Cherukuri) ... </p> \n      <p><b>... Joe Hack</b>, Sen. Deb. Fischer’s chief of staff and one of the youngest (if not the youngest) chiefs on the Senate side, is 29 (h/t colleague Brianna Puccini) ... Michael David Morgan, former deputy campaign manager for WA State Sen. Michael Baumgartner and current operations analyst for Optimus (h/t Cara Mathis) ... Sean Long, son of two former staffers, the pride of Clarendon, VA and an up and coming star at the DOJ, is 23 (h/t colleague John Sheehy) ... John McCauley, deputy comms. director at the Truman National Security Project ... Jim O’Grady, WNYC reporter and professional storyteller ... Nikhil Joshi, OFA and Obama WH alum now senior manager for biz ops and strategy at Lending Club in San Fran ... Emily Hartmann of Global Strategy Group ...</p> \n      <p><b>... Danny Kanner, </b>now of NBA comms and alum of DGA and OFA ... Dan Reilly, NEA’s senior campaigns and elections specialist and a Teamsters alum … Margo McNabb … Christy Agner … Greg Boatright … Rep. Chellie Pingree (D-Me.) (h/ts Teresa Vilmain) … Abe Dyk … Kimberly Woodard … Sarah Fenn ... Kristina Clara Konrad-Williams ... Clare Osdene Schapiro ... Carole Chouinard ... Alex Rosenwald ... singer Emmylou Harris is 69 ... social critic and author Camille Paglia is 69 ... Jesse Carmichael (Maroon 5) is 37 ... Lee Dewyze (“American Idol”) is 30 ... Aaron Kelly (“American Idol”) is 23 (h/ts AP)</p> \n      <p><b>MATT MACKOWIAK,</b> who for years has faithfully provided Playbookers with Sunday TV listings each weekend, is a candidate for the Man of the Year Campaign benefitting the Leukemia &amp; Lymphoma Society (LLS), whose mission is to fund cutting-edge research and eradicate blood cancers. <i><b>Please thank Matt by checking out his fundraising page. </b></i><a href=\"http://bit.ly/1SuSlSs\">http://bit.ly/1SuSlSs</a> </p> \n      <p><b>And here are THE SHOWS,</b> which @MattMackowiak filed from Austin:</p> \n      <p>--<b>NBC’s “Meet the Press”</b>: Hillary Clinton ... Reince Priebus; Ron Johnson; roundtable: WTMJ-TV’s Charles Benson, David Brooks, Helene Cooper and Amy Walter</p> \n      <p>--<b>ABC’s “This Week”</b>: John Kasich; Bernie Sanders; Reince Priebus; roundtable: Donna Brazile, Matthew Dowd, Hugh Hewitt and Juan Williams</p> \n      <p>--<b>CBS’s “Face the Nation”</b>: Donald Trump; Reince Priebus; roundtable: Peggy Noonan, Ed O’Keefe, Mark Leibovich and Ruth Marcus; new results from the CBS News 2016 Battleground Tracker Poll from Wisconsin, Pennsylvania and New York with CBS News’ Anthony Salvanto</p> \n      <p>--<b>“Fox News Sunday”</b>: Donald Trump; Reince Priebus; roundtable: George Will, Julie Pace, Stephen Dinan and Charles Lane; “Power Player of the Week” with John Ficklin, the 10<sup>th</sup> member of his family to work in the White House</p> \n      <p>--<b>CNN’s “State of the Union” </b>(9am ET / 12pm ET): Reince Priebus; Bernie Sanders; roundtable: Bakari Sellers, Amanda Carpenter, Nina Turner and Andre Bauer</p> \n      <p>-<b>-Univision’s “Al Punto”</b> (SUN 10am ET / 1pm PT) Jorge Casta<b>&ntilde;</b>eda; Venezuelan opposition leader and wife of political prisoner (Leopoldo L<b>&oacute;</b>pez) Lilian Tintori; Republican political analysts Helen Aguirre and Adryana Boyne; FARC leader Rodrigo Londo<b>&ntilde;</b>o Echeverri “Timochenko”; Peruvian presidential candidate (Peruvians for Change) Pedro Pablo Kuczynski; Peruvian presidential candidate (Popular Action) Alfredo Barnechea (substitute anchor: Enrique Acevedo)</p> \n      <p>--<b>CNN’s “Inside Politics” with John King </b>(SUN 8am ET): Roundtable: Jackie Kucinich, Ed O’Keefe, Jeff Zeleny and Abby Phillip</p> \n      <p>--<b>Fox News’ “Sunday Morning Futures” </b>(10am ET / 9am CT): Peter King; Allianz chief economic adviser Mohamed El-Erian; actor Scott Baio; roundtable: WSJ’s Jon Hilsenrath, Ed Rollins and Scott Brown</p> \n      <p>--<b>CNN’s “Reliable Sources”</b>: (SUN 11am ET): Author Ariana Huffington (“The Sleep Revolution”); Connie Chung and Maury Povich; Matthew Dowd; former People Magazine managing editor Larry Hackett</p> \n      <p>--<b>Fox News’ “MediaBuzz” </b>(SUN 11am ET / 10am CT): Trump campaign spokesperson Katrina Pierson; Andrea Tantaros; Meghan McCain; Kennedy; Sandra Smith; Susan Ferrechio; Amy Holmes; Kirsten Powers; tech analyst Shana Glenzer</p> \n      <p>--<b>CNN’s “Fareed Zakaria GPS”</b>: (SUN 11am ET): National security roundtable: RAND’s Seth Jones, NYPD’s John Miller and former CIA and FBI counterterrorism analyst Phil Mudd; roundtable: E.J. Dionne, author and Republican Main Street Partnership’s Geoffrey Kabaservice (“The Downfall of Moderation and the Destruction of the Republican Party”), author and Princeton University’s Nell Irvin Painter (“Southern History Across the Color Line”) and David Remnick</p> \n      <p><b>--C-SPAN</b>: <b>“The Communicators”</b> (SAT 6:30pm ET): American Cable Association president &amp; CEO Matthew Polka and American Cable Association board chair Robert Gessner ... <b>“Newsmakers”</b> (SUN 10am ET): Club for Growth president David McIntosh, questioned by Washington Examiner’s Gabby Morrongiello and Politico’s James Hohmann ... <b>“Q&amp;A”</b> (SUN 8pm &amp; 11pm ET): Discussion with high school students attending the U.S. Senate Youth Program</p> \n      <p>--<b>MSNBC’s “PoliticsNation with Rev. Al Sharpton”</b>: (SUN 8-9am ET): Ben Carson; Clinton campaign national political director Amanda Renteria; Reuters’ Erin McPike; Amy Holmes; author Michael D’Antonio (“Never Enough”); One Wisconsin Now executive director Scot Ross</p> \n      <p>--<b>MSNBC’s “The Place for Politics”</b>: (SUN 9-10am ET): Capital Times’ Jessie Opoien; The Boston Globe’s James Pindell; Trump campaign senior advisor Barry Bennett; Politico’s Gabe Debenedetti (hosted by MSNBC’s Chris Jansing live from Wisconsin)</p> \n      <p>--<b>MSNBC’s “The Place for Politics”</b>: (SUN 10am-12pm ET): MSNBC’s Irin Carmon; Clinton campaign national spokeswoman Karen Finney; Michael Steele; Public Religion Research Institute’s Robert Jones; Michael Eric Dyson; Nina Turner; John Dean; Wisconsin-based radio host Charlie Sykes; The Nation Magazine’s John Nichols; One Wisconsin Now’s Scot Ross (hosted by MSNBC’s Joy Reid live from New York)</p> \n      <p>--<b>MSNBC’s “The Place for Politics”</b>: (SUN 12-2pm ET): Wisconsin-based radio host Charlie Sykes; Elise Jordan; Howard Dean; Kasich supporter Jack Voight; USA Today’s Paul Singer; RCP’s Rebecca Berg; State Rep. Melissa Sargent (D-WI); former Romney and Rubio advisor Avik Roy; author and Milwaukee Journal-Sentinel’s Jason Stein (“Dropping the Bomb: Scott Walker, Unions and the Fight for a State”) (hosted by MSNBC’s Alex Witt live from New York)</p> \n      <p>--<b>MSNBC’s “The Place for Politics”</b>: (SUN 3-4pm ET): The University of Wisconsin’s Mordecai Lee; former Wisconsin Lt. Gov. Barbara Lawton; WaPo’s Robert Costa; Milwaukee Mayor Tom Barrett (hosted by MSNBC’s Chris Jansing live from Wisconsin)</p> \n      <p>--<b>PBS’s “To the Contrary” with Bonnie Erb&eacute;</b>: Roundtable: Eleanor Holmes Norton; former Judge and federal prosecutor Debra Carnahan; Washington Examiner columnist Ashe Schow; Republican strategist Rina Shah Bharara and Concerned Women for America’s Penny Nance</p> \n      <p>--<b>SiriusXM’s “No Labels Radio” </b>(SAT 10am ET &amp; 6pm ET, SUN 1PM ET): Host Jon Huntsman, with co-hosts No Labels co-founder Stuart Holliday and Politico’s Daniel Lippman, will discuss the Puerto Rico debt crisis with Puerto Rico’s lead restructuring advisor Jim Millstein, the 2016 election cycle with pollster Frank Luntz and the Wisconsin primary with Ron Johnson. <i><b>Pic of the hosts this weekend</b></i> <a href=\"http://bit.ly/1qdGxOa\">http://bit.ly/1qdGxOa</a> </p> \n      <p>--<b>Sinclair’s “Full Measure” with Sharyl Attkisson</b> (SUN 9:30am ET on WJLA and airing on Sinclair stations nationwide): the dangerous underground world of smuggling, of both drugs and humans; correspondent Scott Thuman examines how Morocco, one of the most beautiful countries on Earth, is being associated with the most recent terrorist attacks. </p> \n      <p><b>** A message from the Embassy of the United Arab Emirates:</b> Together, the UAE and US are working to make the Middle East more secure and prevent the proliferation of nuclear weapons.</p> \n      <p>The UAE is the only country in the Arabian Gulf to have a civilian nuclear energy accord (also known as a 123 Agreement) with the US. Called the “gold standard” by nonproliferation experts and government leaders, the UAE has boldly pledged not to enrich uranium or extract plutonium.</p> \n      <p>Next year, the UAE’s first nuclear energy station will begin generating safe, clean electricity. A leader in developing new technologies for renewable and sustainable energy, the UAE is also home to the International Renewable Energy Agency and the Masdar Institute.</p> \n      <p>Learn about how the UAE and US are united for a better future: <a href=\"http://bit.ly/1WFzIyE\">bit.ly/1WFzIyE</a> **</p> \n      <p><b>SUBSCRIBE</b> to the Playbook family: <b>POLITICO Playbook </b> <a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdf6a1974229c1d9ebdd76fec80e3e0b9c4f389a6aebb15331\">http://politi.co/1M75UbX</a> ... <b>New York Playbook </b> <a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdd95a477c264b940d3d68e1bf06d895022c7331c2d59002ba\">http://politi.co/1ON8bqW</a> ... <b>Florida Playbook </b> <a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdf6adbc9a5e2ca9fb024caa7dd707c64d9f8d05282a7904f0\">http://politi.co/1JDm23W</a> ... <b>New Jersey Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfd43ea324572eabb35237a2c8ff38089b777628beacc2d4337\">http://politi.co/1HLKltF</a> ...<b> Massachusetts Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfd4109bd5463ef8db58d28d5151d70170c0d641b0aeacfb453\">http://politi.co/1Nhtq5v</a> ... <b>Illinois Playbook </b> <a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfd6d4bff376738b3cc9b795cde953f294cf858b75e52b1dffc\">http://politi.co/1N7u5sb</a> ... <b>California Playbook</b> <a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdd58947052fda88f4ea11d09a07f7f5eb6c84873711479543\">http://politi.co/1N8zdJU</a> ... <b>Brussels Playbook </b> <a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdc6d26c38b244d39857441567733e7f3775cbd8ffde507bef\">http://politi.co/1FZeLcw</a></p>\n      <div>  \n       <div> \n        <a href=\"http://www.politico.com/tipsheets/playbook/archive\">&laquo; View Archives</a>\n       </div>  \n      </div> \n      <div> \n       <div> \n        <div> \n         <div> \n         </div> \n        </div> \n        <aside> \n         <div>  \n          <h2></h2>  \n          <div> \n           <a href=\"http://www.politico.com/story/2016/04/donald-trump-corey-lewandowski-shrinking-role-campaign-221487\"><img width=\"4047\" height=\"2194\" /></a>\n          </div> \n          <div> \n           <ol> \n            <li> \n             <article> \n              <div>  \n               <h3><a href=\"http://www.politico.com/story/2016/04/donald-trump-corey-lewandowski-shrinking-role-campaign-221487\">Trump campaign shrinks Lewandowski's role</a></h3>  \n              </div> \n             </article> </li> \n            <li> \n             <article> \n              <div>  \n               <h3><a href=\"http://www.politico.com/story/2016/04/donald-trump-tennessee-gop-delegates-221489\">Tennessee GOP delegate fight erupts ahead of party meeting</a></h3>  \n              </div> \n             </article> </li> \n            <li> \n             <article> \n              <div>  \n               <h3><a href=\"http://www.politico.com/story/2016/04/hillary-clinton-bernie-sanders-attacks-221484\">Sanders gets under Clinton's skin in New York</a></h3>  \n              </div> \n             </article> </li> \n            <li> \n             <article> \n              <div>  \n               <h3><a href=\"http://www.politico.com/story/2016/04/donald-trump-delegates-north-dakota-gop-221480\">How the North Dakota GOP is freezing out Trump</a></h3>  \n              </div> \n             </article> </li> \n            <li> \n             <article> \n              <div>  \n               <h3><a href=\"http://www.politico.com/story/2016/04/hillary-clinton-fbi-strategy-emails-221435\">Clinton aides unite on FBI legal strategy</a></h3>  \n              </div> \n             </article> </li> \n           </ol> \n          </div> \n         </div> \n        </aside> \n        <aside> \n         <div>  \n          <h2><a href=\"http://www.politico.com/tipsheets/playbook/archive\">Playbook - POLITICO Archive</a></h2>  \n          <div> \n           <ul> \n            <li> \n             <article> \n              <div>  \n               <h3> <a href=\"http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545\">Saturday,  4/2/16 </a></h3>  \n              </div> \n             </article> </li> \n            <li> \n             <article> \n              <div>  \n               <h3> <a href=\"http://www.politico.com/playbook/2016/04/trump-would-be-most-unpopular-major-party-nominee-in-32-years-playbook-breakfast-with-white-house-counsel-neil-eggleston-and-senior-adviser-brian-deese-livestreams-7-57-am-213524\">Friday,  4/1/16 </a></h3>  \n              </div> \n             </article> </li> \n            <li> \n             <article> \n              <div>  \n               <h3> <a href=\"http://www.politico.com/playbook/2016/03/how-will-bill-take-trump-attacks-on-hillary-playbook-breakfast-on-the-court-tomorrow-white-house-counsel-neil-eggleston-senior-adviser-brian-deese-remembering-jennifer-frey-213503\">Thursday,  3/31/16 </a></h3>  \n              </div> \n             </article> </li> \n            <li> \n             <article> \n              <div>  \n               <h3> <a href=\"http://www.politico.com/playbook/2016/03/trump-to-people-im-the-highest-level-of-smart-4-hours-of-sleep-on-working-out-dont-have-to-when-youre-making-america-great-again-you-get-a-lot-of-exercise-213479\">Wednesday,  3/30/16 </a></h3>  \n              </div> \n             </article> </li> \n            <li> \n             <article> \n              <div>  \n               <h3> <a href=\"http://www.politico.com/playbook/2016/03/great-mentioner-trumps-cabinet-jill-abramsons-new-column-david-gregorys-tv-gig-bday-peter-cherukuri-paul-farhi-robert-gibbs-steve-peoples-roger-simon-peter-velz-213455\">Tuesday,  3/29/16 </a></h3>  \n              </div> \n             </article> </li> \n           </ul> \n          </div>  \n          <ul> \n           <li> <a href=\"http://www.politico.com/tipsheets/playbook/archive\">View the Full Playbook Archives &raquo;</a></li> \n          </ul>  \n         </div> \n        </aside> \n        <aside> \n         <div>  \n          <h2> <b></b> <b></b> Politico Magazine </h2>  \n          <div> \n           <ul> \n            <li> \n             <article>  \n              <div> \n               <a href=\"http://www.politico.com/magazine/story/2016/04/the-next-donald-trumps-213786\"><img alt=\"Donald Trump and Martha Stewart &quot;fire&quot; up NBC's promo campaign for both &quot;Apprentice&quot; shows. The two icons came together to shoot multiple spots for NBC. The promos, which began airing on Aug. 9, show off their lighter sides.\" /></a>\n              </div>  \n              <div>  \n               <h3><a href=\"http://www.politico.com/magazine/story/2016/04/the-next-donald-trumps-213786\">The Next Donald Trumps</a></h3>   \n               <p> By Luke O'Neil</p>  \n              </div> \n             </article> </li> \n            <li> \n             <article>  \n              <div> \n               <a href=\"http://www.politico.com/magazine/story/2016/03/2016-election-defense-military-industry-contractors-donations-money-contributions-presidential-hillary-clinton-bernie-sanders-republican-ted-cruz-213783\"><img alt=\"160331_CPI_defense_fighter_gty.jpg\" /></a>\n              </div>  \n              <div>  \n               <h3><a href=\"http://www.politico.com/magazine/story/2016/03/2016-election-defense-military-industry-contractors-donations-money-contributions-presidential-hillary-clinton-bernie-sanders-republican-ted-cruz-213783\">The Defense Industry’s Surprising 2016 Favorites: Bernie &amp; Hillary</a></h3>   \n               <p> By Alexander Cohen</p>  \n              </div> \n             </article> </li> \n            <li> \n             <article>  \n              <div> \n               <a href=\"http://www.politico.com/magazine/story/2016/03/doug-sosnik-memo-2016-is-over-213753\"><img alt=\"160321_sosnik_ap.jpg\" /></a>\n              </div>  \n              <div>  \n               <h3><a href=\"http://www.politico.com/magazine/story/2016/03/doug-sosnik-memo-2016-is-over-213753\">Here’s How You Know 2016 Is Already Decided</a></h3>   \n               <p> By Doug Sosnik</p>  \n              </div> \n             </article> </li> \n            <li> \n             <article>  \n              <div> \n               <a href=\"http://www.politico.com/magazine/story/2016/04/donald-trump-ted-cruz-2016-911-muslims-tom-ridge-213785\"><img alt=\"160401_ridge_getty.jpg\" /></a>\n              </div>  \n              <div>  \n               <h3><a href=\"http://www.politico.com/magazine/story/2016/04/donald-trump-ted-cruz-2016-911-muslims-tom-ridge-213785\">What Trump and Cruz’s Clueless Muslim Rhetoric Will Cost America</a></h3>   \n               <p> By Tom Ridge</p>  \n              </div> \n             </article> </li> \n           </ul> \n          </div> \n         </div> \n        </aside> \n        <div> \n         <div> \n         </div> \n        </div> \n       </div> \n      </div> \n     </div> \n    </div> \n   </section> \n  </article> \n </div> \n</div>",
  "main_length" : 43943,
  "main_checksum" : "wuwwljNDSjPAjO8RpuUDy5yCmxI",
  "main_format" : "HTML",
  "extract" : "<a href=\"http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545\">BERNIE GETS UNDER HILLARY’S SKIN -- MANAFOT RISING: Trump’s new guru builds empire, including veterans of GOP’s last contested convention – B’DAYS: Brent Colburn, Meridith Webster</a><p>Inside the campaigns and Corey Lewandowski's shrinking role.</p>04/02/16 01:05 PM EDT<p>The camps are quarreling over when the debate should take place.</p>04/02/16 12:57 PM EDT<a href=\"http://www.politico.com/magazine/story/2016/04/the-next-donald-trumps-213786\">The Next Donald Trumps</a><p> By Luke O'Neil</p><p>7 celebrities who’ve got what it takes to follow in the brazen billionaire’s footsteps in 2020.</p>04/02/16 07:56 AM EDT<p>Despite billionaire's staunch defense, embattled campaign manager is losing clout.</p><p> Updated </p><p>The skirmish is the latest in the increasingly fierce battle for delegates to the Republican National Convention in Cleveland.</p>04/01/16 11:30 PM EDT<p>The Republican front-runner's suggestion to blow up decades of non-proliferation policy provokes the president.</p><p> Updated </p><p>Obama also urged Iran not to violate the spirit of the nuclear deal, even if it is technically abiding by it.</p>04/01/16 07:32 PM EDT<p>The real estate mogul also said he agrees with the notion that abortion is murder.</p>04/01/16 07:20 PM EDT<p>He's making an aggressive push for Jewish support, casting himself as a steadfast supporter of Israel.</p><p> Updated </p><p>“I think she probably owes the senator an apology for that because the senator is not lying about her record.”</p>04/01/16 06:48 PM EDT<p>Republican candidates are fighting for 25 delegates doled out during one chaotic weekend, but state party rules put outsiders at a steep disadvantage.</p>04/01/16 05:47 PM EDT<p>&quot;I can’t even imagine what’s in those emails. But I’m sure I would probably be mortified.&quot;</p>04/01/16 04:51 PM EDT<a href=\"http://www.politico.com/tipsheets/the-2016-blast/2016/04/cruzs-evangelical-problem-democrats-lying-spat-clinton-aides-all-four-one-kasich-doesnt-like-lyin-213544\">Cruz’s evangelical problem </a><p> By Henry C. Jackson</p><p>Unless Cruz can quickly make inroads with non-evangelical voters who so far have mostly rejected him, he has little chance of stopping Trump.</p>04/01/16 04:22 PM EDT<p>Amid the fanfare over President Barack Obama’s visit to Havana, U.S. officials and executives from major food companies are eyeing the island as a potential...</p>04/01/16 04:17 PM EDT<p>U.S. manufacturers have something to say to Donald Trump, Bernie Sanders and Hillary Clinton: Stop trying to “protect” us with slams on free trade.</p>04/01/16 04:12 PM EDT<p>The business group said a recession would set in during the first year under the Republican front-runner's proposed tariffs because China and Mexico would...</p>04/01/16 02:31 PM EDT<p>The Texan surged to second place as the candidate of Christian conservatives. His challenge now is how few are left to vote.</p>04/01/16 02:00 PM EDT<p>NCEC returns — GM's Bhuta to Bockorny</p>04/01/16 01:33 PM EDT<p>&quot;And no, this is not an April Fools' [joke],&quot; he wrote on Facebook.</p><p> Updated </p><p>&quot;There isn’t anybody except one candidate who has a higher favorable than unfavorable rating.&quot;</p>04/01/16 01:23 PM EDT<p>Obama has commuted the sentences of 248 federal prisoners, mostly low-level drug offenders affected by mandatory minimum drug sentences, including 61 on...</p>04/01/16 12:30 PM EDT<p>The Republican front-runner risks an exodus of delegates if he fails to clinch the nomination outright.</p>04/01/16 11:07 AM EDT<p>Donald Trump's rocky week is likely to force his longtime backers, especially women, to reconsider their support of him, former Texas Gov. Rick Perry said...</p>04/01/16 10:59 AM EDT<p>&quot;We haven't talked about any specific positions. What we have talked about is the fact that we both have a strong desire to heal this country.&quot;</p>04/01/16 10:56 AM EDT<p>&quot;Garland was a member of the panel at the time the case was argued but did not participate in this opinion,&quot; the opinion says in a footnote.</p>04/01/16 10:53 AM EDT<p>Obama’s personal appeal will come as Garland continues to meet with senators individually.</p>04/01/16 10:48 AM EDT<p>Hillary Clinton owes Bernie Sanders' campaign an apology, the campaign said Friday.</p>04/01/16 10:00 AM EDT<p>What we learned about opioids yesterday —&nbsp;More bad news for Theranos</p>04/01/16 10:00 AM EDT<p>Bayer seeks ALJ review of EPA pesticide cancellation — Small number of dairy producers to get payments under MPP</p>04/01/16 10:00 AM EDT<p>Tesla’s tax benefits — Trump and Cruz find agreement (on the carbon tax)</p><h4>You're All Caught Up</h4><p>We're working on more stories right now</p><h5>Check out today's hot topics</h5><p>Mike Allen's must-read briefing on what's driving the day in Washington </p>Subscribe<h1>BERNIE GETS UNDER HILLARY’S SKIN -- MANAFOT RISING: Trump’s new guru builds empire, including veterans of GOP’s last contested convention – B’DAYS: Brent Colburn, Meridith Webster</h1>04/02/16 01:32 PM EDT<b>By Mike Allen </b><p>(@mikeallen; <a href=\"mailto:mallen@politico.com\">mallen@politico.com</a>)<b> and Daniel Lippman </b>(@dlippman; <a href=\"mailto:dlippman@politico.com\">dlippman@politico.com</a>)</p><b>Happy Saturday! </b><p>Obama alumnus Brent Colburn asked that we include this in celebration of his birthday: Today “is World Autism Awareness Day, and in celebration of my nephew Cordis and his amazing parents, Brian &amp; Andrea Colburn, I’d like to encourage every Playbooker to take a minute and learn more about ... the autism community.” <a href=\"http://www.autismspeaks.org\">www.autismspeaks.org</a></p><p>Story Continued Below</p><b>INSIDE THE CAMPAIGNS – “Trump campaign shrinks Lewandowski’s role: </b><p>Despite the billionaire’s staunch defense, his embattled campaign manager is losing clout,” by Ben Schreckinger and Ken Vogel, with Hadas Gold: “Trump’s just-named convention manager, Paul Manafort, is expected to take a leading role not just in the selection of delegates, but in the remaining primaries themselves. ... [A] person involved in Trump’s campaign [said:] ... ‘Mr. Trump’s listening to other people now. The crew’s expanding.’ ... </p><b>“[T]his winter,</b><p> ... National Political Director Michael Glassner ... was [promoted] to deputy campaign manager ... On March 2, the campaign promoted Stuart Jolly ... to national field director, giving him primary authority over ... hiring ... field staff. ... <b>Manafort has quickly taken charge of his own fiefdom in Washington, and is planning to hire a team of his own,</b> which is likely to include several veterans of the 1976 Republican National Convention – the party’s last convention at which the presidential nomination was contested.” <a href=\"http://politi.co/22YORlU\">http://politi.co/22YORlU</a></p><b>**SUBSCRIBE to Playbook</b><p>: <a href=\"http://politi.co/1M75UbX\">http://politi.co/1M75UbX</a></p><b>CHASER – “Trump touts his loyalty in defending campaign manager</b><p>,” by AP’s Jill Colvin in Appleton, Wis.: “Trump [said in a phoner Thu. evening that] his decision to stand behind his campaign manager ... is a sign of loyalty — a trait that Trump has displayed, for better or worse, through much of his career.” <a href=\"http://apne.ws/1RS7xLX\">http://apne.ws/1RS7xLX</a></p><b>ALEX BURNS, who turns 3-0 tomorrow, </b><p>coins a memorable phrase on N.Y. Times p. A9, “G.O.P. Fears Trump as <b>Zombie Candidate: Damaged but Unstoppable”:</b> “Republicans who once worried that Mr. Trump might gain overwhelming momentum ... are now becoming preoccupied with a different grim prospect: that Mr. Trump might become a kind of zombie candidate — damaged beyond the point of repair, but too late for any of his rivals to stop him.” <a href=\"http://nyti.ms/1ZTJn6E\">http://nyti.ms/1ZTJn6E</a></p><b>VICE PRESIDENT BIDEN </b><p>and Dr. Jill Biden will be at tonight’s NCAA Final Four semifinal games in Houston to promote the It’s On Us campaign to end sexual assault on campus. The two will appear for a pregame interview on TBS. They return to D.C. on Sunday.</p><b>--Other Washingtonians at the Final Four:</b><p> Jonathan Martin, who’s a birthday boy tomorrow, and Betsy Fischer Martin; John Feinstein; and Danielle and Jeff Jones.</p><b>AP for SUNDAY PAPERS – “Clinton’s frustration grows, as primary race drags on,”</b><p> by Lisa Lerer in Syracuse and Ken Thomas in N.Y.: “Hillary Clinton snapped at a Greenpeace protester. She linked Bernie Sanders and tea party Republicans. And she bristled with anger when nearly two dozen Sanders supporters marched out of an event near her home outside New York City ... After a year of campaigning, months of debates and 35 primary elections, Sanders is finally getting under Clinton’s skin ... Clinton has spent weeks largely ignoring Sanders and trying to focus on ... Trump. Now, after several primary losses and with a tough fight in New York on the horizon, Clinton is showing flashes of frustration with the Vermont senator ...</p><b>“According to Democrats close to Hillary</b><p> and former President Bill Clinton, both are frustrated by Sanders' ability to cast himself as above politics-as-usual even while firing off what they consider to be misleading attacks. The Clintons are even more annoyed that Sanders' approach seems to be rallying ... young voters by his side. While Hillary Clinton's team contends her lock on the nomination as ‘nearly insurmountable,’ the campaign frequently grumbles that Sanders hasn't faced the same level of scrutiny ... Her aides complain about Sanders' rhetoric, claiming he's broken his pledge to avoid character attacks ... </p><b>“Clinton hopes that big victories</b><p> in New York on April 19 and five Northeastern states a week later will allow her to wrap up the nomination by the end of the month. But aides acknowledge that Sanders ... is unlikely to feel significant political or financial pressure to drop out of the race, even if it becomes clear he cannot win ... Sanders must win 67 percent of the remaining delegates and uncommitted superdelegates ... through June to be able to clinch the Democratic nomination. So far he's only winning 37 percent. </p><b>“Joel Benenson,</b><p> Clinton’s chief strategist, said: ‘We’re going to get to a point at the end of April where there just isn’t enough real estate for him to overcome the lead that we’ve built.’ Still, any kind of truce is probably weeks, if not months, away. ... Sanders is costing Clinton significant time, money and political capital [and] is drawing sizable crowds in New York.” <a href=\"http://apne.ws/1pUb1Uo\">http://apne.ws/1pUb1Uo</a></p><b>FRIENDS NOW CALLING SPICER “Mr. Chairman” ... “Backstage maneuvering begins in wide-open GOP chairman’s race,” by The Hill’s Scott Wong:</b><p> “Two ... RNC senior officials also have been mentioned as potential Priebus successors: John Ryder, the RNC’s general counsel, and Sean Spicer, the RNC’s chief strategist and communications director.... [and] the RNC’s top communicator since 2011.” <a href=\"http://bit.ly/1UyiRAk\">http://bit.ly/1UyiRAk</a><b>--@seanspicer</b>: “Sunday add @GOP’s @Reince to list of ppl that have done ‘full Ginsburg’ @ThisWeekABC @meetthepress @FoxNewsSunday @FaceTheNation @CNNSotu”</p><b>--TELLY LOVELACE</b><p> named RNC’s national director of African American initiatives and media -- Release: “Telly joins the RNC from IR+ Media ... where he served as Managing Director. Previously, Telly served as a senior member of Maryland Governor Larry Hogan’s communications team.”</p><b>** A message from the Embassy of the United Arab Emirates:</b><p> The UAE stands with the US and President Obama in a shared commitment to stopping the proliferation of nuclear weapons. This is one of many areas where the UAE and US work together to strengthen stability and security in the Mideast and around the world. Learn more: <a href=\"http://bit.ly/1WFzIyE\">bit.ly/1WFzIyE</a> **</p><b>PIC DU JOUR: </b><p>Colleagues of Jen Friedman in the White House press office played an April Fool’s joke on her yesterday by putting her birthday in Playbook. That led to dozens of happy birthday emails to her, including from senior staff, even though the deputy press secretary's real birthday is Nov. 7. They also decorated her office with a “Happy Birthday” banner and a balloon. </p><i>Pic of her decorated desk <a href=\"http://bit.ly/21XVFdO\">http://bit.ly/21XVFdO</a></i><b>LIFE ONLINE – “Snapchat’s Ultimate Goal Isn’t Just Chat—It’s Total Media Domination,” by Fortune’s Mathew Ingram:</b><p> “The latest iteration came this week with the addition of new features including video calling, audio and video messaging, GIFs, and stickers. Unlike a lot of other messaging apps, all of the new features are blended together—users can seamlessly toggle between video and audio, send short notes, and draw on top of shared photos.” <a href=\"http://for.tn/1VjJ7gB\">http://for.tn/1VjJ7gB</a></p><b>TOMORROW’S TIMES TODAY -- “Navy SEALs Split Over Members’ Benefiting From Hard-Earned Brand,” by Nicholas Kulish, Christopher Drew, and Sean D. Naylor: </b><p>“[F]ormer members ... are increasingly giving paid speeches, sounding off on politics on Fox News and stamping the force’s name on hats, backpacks, vitamins ... [A] half-dozen books are scheduled to roll off the presses in coming months, adding to the 100-plus published by former SEALs since 2001.... Far more SEALs have gone public than their more reticent Army counterparts in Delta Force and the Rangers ...</p><b>“One author, Matt Bissonnette, </b><p>earned millions for ‘No Easy Day,’ a firsthand narrative of the Bin Laden raid, but had to forfeit the profits for failing to submit it for Pentagon review of classified information.” <a href=\"http://nyti.ms/1RS6BqQ\">http://nyti.ms/1RS6BqQ</a></p><b>WHITE HOUSE DEPARTURE LOUNGE:</b><p> Noah Schwartz left the National Security Council on Friday where he was advisor to the deputy national security advisor for int’l econ; he’s headed to the Office of the Secretary of Defense, where he’ll be working on South and Southeast Asia policy. He emails friends: “It has been a great privilege to serve on the NSC staff these past few years, and I will always be grateful for the experience. Special thanks to everyone (past and present) on the international economics team.”</p><b>POLITICO MAGAZINE FRIDAY COVER – “9/11: What Would Trump Do?”:</b><p> “Politico Magazine asked foreign policy and counterterrorism experts, historians, Trump biographers, even psychologists to take a serious guess at how he’d handle the days after a terrorist attack in the United States—all based on what they know about Trump the candidate and what he’ll be facing if he gets elected.” <i>With hot takes from Jacob Heilbrunn, Ian Bremmer, Amb. Dennis Ross, Aaron David Miller, Andrew Bacevich and more</i><a href=\"http://politi.co/1UJ0n0k\">http://politi.co/1UJ0n0k</a></p><b>STUFF TRUMP SAYS -- “How Donald Trump sees himself,” by CNN’s Scott Glover and Maeve Reston with a video by Brenna Williams:</b><p> “He considers himself a member of ‘the lucky sperm club.’He trusts no one, and places a premium on revenge. (‘If you do not get even, you are just a schmuck!’)He treats every decision he makes ‘like a lover,’ sometimes thinking with his head, other times with other parts of his body, because it reminds him to ‘keep in touch with my basic impulses.’And to make creative choices, he writes: ‘I try to step back and remember my first shallow reaction. The day I realized it can be smart to be shallow was, for me, a deep experience.’” <a href=\"http://cnn.it/1SGvhCL\">http://cnn.it/1SGvhCL</a> ... <a href=\"http://cnn.it/1SsWF4w\">http://cnn.it/1SsWF4w</a></p><b>2-min. video</b><b>SUBJECT LINE DU JOUR – </b><p>Trump’s menacing campaign email sent Friday:“We’re Coming For You Wisconsin!” <a href=\"http://bit.ly/25D0qOL\">http://bit.ly/25D0qOL</a></p><b>Text of his email</b><b>VIDEO DU JOUR – “Donald Trump’s Love/Hate Relationship With Women”</b><p> – Politico Magazine – <a href=\"http://politi.co/25Bwouu\">http://politi.co/25Bwouu</a></p><b>2-min. video </b><b>CLICKERS – “The nation’s cartoonists on the week in politics,” edited by Matt Wuerker – </b><i>11 keepers</i><a href=\"http://politi.co/22WoUUd\">http://politi.co/22WoUUd</a><p> ... <a href=\"http://politi.co/1RRWsxo\">http://politi.co/1RRWsxo</a></p><b>Matt’s thirteen March cartoons</b><b>GREAT WEEKEND READS,</b><p> curated by Daniel Lippman:</p><p>--“<b>Jesus of Nazareth, Whose Messianic Message Captivated Thousands, Dies at About 33,” by Sam Roberts in Vanity Fair: </b>“Roberts, an obituary writer for The New York Times, imagines how, given the facts available then, his predecessors might have reported the aftermath of an execution in the Middle East one Friday two millennia ago.” <a href=\"http://bit.ly/1UZby4C\">http://bit.ly/1UZby4C</a></p><p>--<b>“The Men Who Gave Trump His Brutal Worldview,” by Michael D’Antonio, </b>author of “Never Enough: Donald Trump and the Pursuit of Success,” on Politico Magazine: “Tutored by his fiercely ambitious father and tough-as-nails high school coach, the GOP frontrunner has only one ethical code: life is combat.” <a href=\"http://politi.co/1LYiY5C\">http://politi.co/1LYiY5C</a><i>...<b> $15.16 on Amazon</b></i><a href=\"http://amzn.to/1g8Rlak\">http://amzn.to/1g8Rlak</a></p><b>--“This Professor Knows Why You Hate Ted Cruz’s Face” --Washingtonian Staff</b><p>: “(He read the other candidates’ faces, too.)” <a href=\"http://bit.ly/1M1RDiU\">http://bit.ly/1M1RDiU</a></p><b>--“Emma Smith on The Best Plays of Shakespeare” – interviewed by Beatrice Wilford on FiveBooks.com:</b><p> “In the first of a series marking the 400th year since the playwright’s death, we ask Shakespearean scholar Emma Smith to pick her five favourite plays.” <a href=\"http://bit.ly/1MHUeyx\">http://bit.ly/1MHUeyx</a></p><b>--“Murder in Mayfair,” by Peter Pomerantsev in The London Review of Books,</b><p> reviewing “A Very Expensive Poison: The Definitive Story of the Murder of Litvinenko and Russia’s War with the West,” by Luke Harding: “As he lay dying Alexander Litvinenko ... found it increasingly hard to open his mouth to talk, as he became yellow and shrivelled, he cursed himself for letting his guard down: he had assumed he was safe after receiving asylum and citizenship in the UK.” <a href=\"http://bit.ly/1MHUqhg\">http://bit.ly/1MHUqhg</a> ... <a href=\"http://amzn.to/1RS7lfM\">http://amzn.to/1RS7lfM</a> (h/t TheBrowser.com)</p><b>$12.93 on Amazon</b><b>--“The Strange Case of a Nazi Who Became an Israeli Hitman,” by Forward’s Dan Raviv and Yossi Melman in Haaretz:</b><p> “Otto Skorzeny, one of the Mossad’s most valuable assets, was a former lieutenant colonel in Nazi Germany’s Waffen-SS and one of Adolf Hitler’s favorites.” <a href=\"http://bit.ly/1RRUhds\">http://bit.ly/1RRUhds</a></p><b>--“How to Hack an Election,” by Jordan Robertson, Michael Riley, and Andrew Willis on Bloomberg Businessweek’s</b><p> international cover: “Andr&eacute;s Sep&uacute;lveda rigged elections throughout Latin America for almost a decade. He tells his story for the first time.” <a href=\"http://bloom.bg/1RRUqxm\">http://bloom.bg/1RRUqxm</a> ... <a href=\"http://bit.ly/1qkEUOn\">http://bit.ly/1qkEUOn</a></p><b>The cover</b><b>--“How Meryl Streep Battled Dustin Hoffman, Retooled Her Role, and Won Her First Oscar,” by Michael Schulman </b><p>on the cover of April’s Vanity Fair, in an adaptation of his upcoming biography “Her Again: Becoming Meryl Streep” (out April 26):“At 29, Meryl Streep was grieving for a dead lover, falling for her future husband, and starting work on Kramer vs. Kramer, the movie that would make her a star and sweep the 1980 Oscars. ... Schulman recounts the struggles—physical, emotional, and intellectual—that launched Streep’s legend.” <a href=\"http://bit.ly/1RRUw8l\">http://bit.ly/1RRUw8l</a> ... <a href=\"http://bit.ly/1RuAL4x\">http://bit.ly/1RuAL4x</a> ... <a href=\"http://amzn.to/1pUgx9P\">http://amzn.to/1pUgx9P</a> (h/t Longform.org)</p><b>The cover </b><b>$20.35 pre-order on Amazon </b><b>--“#Jihad: Why ISIS is winning the social media war,” by Brendan I. Koerner in Wired: </b><p>“The group’s closest peers are not just other terrorist organizations, then, but also the Western brands, marketing firms, and publishing outfits—from PepsiCo to BuzzFeed—who ply the Internet with memes and messages in the hopes of connecting with customers.” <a href=\"http://bit.ly/1Y4E0jF\">http://bit.ly/1Y4E0jF</a> ... <a href=\"http://cbsn.ws/1Y6qImE\">http://cbsn.ws/1Y6qImE</a></p><b>Video of</b><b>Koerner on &quot;CBS This Morning: Saturday” </b><b>--“The Longform Guide to the Dark Side of Hollywood”: </b><p>8 pieces on “Corruption, venality, and tragedy: a collection of picks on what lies beneath the glitter.” <a href=\"http://bit.ly/1M7kPVN\">http://bit.ly/1M7kPVN</a></p><b>--“Crowd Source,” by Davy Rothbart in California Sunday Magazine</b><p>: “Inside the company that provides fake paparazzi, pretend campaign supporters, and counterfeit protesters.” <a href=\"http://bit.ly/1SGsWYC\">http://bit.ly/1SGsWYC</a> (h/t Longreads.com)</p><b>MEDIAWATCH – “Andrew Sullivan joins New York Magazine,” by Hadas Gold:</b><p> “Sullivan is joining New York Magazine [as] a contributing editor, the publication’s editor-in-chief Adam Moss announced on Friday. Sullivan ... will write features throughout the year and cover politics, including the 2016 Democratic and Republican National Conventions. In a note on Facebook, Sullivan said his first piece is on Donald Trump. ... Sullivan’s standalone blog, The Dish, stopped publishing in February 2015, with Sullivan citing financial and personal difficulties. ... Prior to his career as a blogger, Sullivan was editor of The New Republic and a writer for New York Times Magazine.” <a href=\"http://politi.co/1RRWYvm\">http://politi.co/1RRWYvm</a> ... <a href=\"http://bit.ly/1Y4FDh8\">http://bit.ly/1Y4FDh8</a></p><b>His letter to “Dishheads” </b><b>FINAL FOUR --“‘One Shining Moment’ gets team-specific twist with Ne-Yo” –AP/Houston</b><p>: “‘One Shining Moment’ is getting a new Grammy-winning voice and some team-specific highlights at the end of the NCAA Final Four. The song has been the backdrop for the highlight piece to wrap up NCAA Tournaments for three decades, and the version by the late Luther Vandross will continue to be used for the national broadcast on TBS. For TNT and truTV, a rendition by three-time Grammy Award winner Ne-Yo will be used after the first-ever team-specific broadcasts of the national championship game. Ne-Yo’s performance will accompany team-centric highlights of the schools being featured in the Team Stream presentations, following their quest leading up to and during the title game.” <a href=\"http://yhoo.it/1qb6FsM\">http://yhoo.it/1qb6FsM</a><i>...<b> Last year’s NCAA highlight piece</b></i><a href=\"http://bit.ly/21ZpB9r\">http://bit.ly/21ZpB9r</a> ... <a href=\"http://bit.ly/1pUoSKC\">http://bit.ly/1pUoSKC</a></p><b>Charles Barkley’s 30-second rendition </b><b>BIRTHWEEK (was yesterday):</b><p> Noah Schwartz … Susan Pisano</p><b>BIRTHDAYS:</b><p> Dr. Jud Feldman (hat tip: MBF) ... Brent Colburn ... Meridith Webster, director of comms. and public affairs at Bloomberg and an Obama West Wing alum -- fun fact: Favorite color is orange! (h/ts Ben Chang) ... Politico’s Dana Rubinstein and Josefa Velasquez … Lynda Tran, 270 Strategies founding partner and CBS News political contributor (h/t Heather Purcell and Eleis Brennan) ... Brian Austin, consultant at Kaiser Associates (h/t wife Emily Stephenson) ... Emily Steel, a TV and media reporter at the NYT and a WSJ and FT alum ... Carl Kasell, formerly of NPR (whose voice is on HIS answering machine?) … Robby Zirkelbach, SVP of comms at PhRMA and AHIP alum, mediocre golfer, avid Hawkeyes fan and great friend (h/t Joe Brettell) ... Dan Sallick, partner and co-founder of Subject Matter (h/t Peter Cherukuri) ... </p><b>... Joe Hack</b><p>, Sen. Deb. Fischer’s chief of staff and one of the youngest (if not the youngest) chiefs on the Senate side, is 29 (h/t colleague Brianna Puccini) ... Michael David Morgan, former deputy campaign manager for WA State Sen. Michael Baumgartner and current operations analyst for Optimus (h/t Cara Mathis) ... Sean Long, son of two former staffers, the pride of Clarendon, VA and an up and coming star at the DOJ, is 23 (h/t colleague John Sheehy) ... John McCauley, deputy comms. director at the Truman National Security Project ... Jim O’Grady, WNYC reporter and professional storyteller ... Nikhil Joshi, OFA and Obama WH alum now senior manager for biz ops and strategy at Lending Club in San Fran ... Emily Hartmann of Global Strategy Group ...</p><b>... Danny Kanner, </b><p>now of NBA comms and alum of DGA and OFA ... Dan Reilly, NEA’s senior campaigns and elections specialist and a Teamsters alum … Margo McNabb … Christy Agner … Greg Boatright … Rep. Chellie Pingree (D-Me.) (h/ts Teresa Vilmain) … Abe Dyk … Kimberly Woodard … Sarah Fenn ... Kristina Clara Konrad-Williams ... Clare Osdene Schapiro ... Carole Chouinard ... Alex Rosenwald ... singer Emmylou Harris is 69 ... social critic and author Camille Paglia is 69 ... Jesse Carmichael (Maroon 5) is 37 ... Lee Dewyze (“American Idol”) is 30 ... Aaron Kelly (“American Idol”) is 23 (h/ts AP)</p><b>MATT MACKOWIAK,</b><p> who for years has faithfully provided Playbookers with Sunday TV listings each weekend, is a candidate for the Man of the Year Campaign benefitting the Leukemia &amp; Lymphoma Society (LLS), whose mission is to fund cutting-edge research and eradicate blood cancers. <a href=\"http://bit.ly/1SuSlSs\">http://bit.ly/1SuSlSs</a></p><b>Please thank Matt by checking out his fundraising page. </b><b>And here are THE SHOWS,</b><p> which @MattMackowiak filed from Austin:</p><p>--<b>NBC’s “Meet the Press”</b>: Hillary Clinton ... Reince Priebus; Ron Johnson; roundtable: WTMJ-TV’s Charles Benson, David Brooks, Helene Cooper and Amy Walter</p><p>--<b>ABC’s “This Week”</b>: John Kasich; Bernie Sanders; Reince Priebus; roundtable: Donna Brazile, Matthew Dowd, Hugh Hewitt and Juan Williams</p><p>--<b>CBS’s “Face the Nation”</b>: Donald Trump; Reince Priebus; roundtable: Peggy Noonan, Ed O’Keefe, Mark Leibovich and Ruth Marcus; new results from the CBS News 2016 Battleground Tracker Poll from Wisconsin, Pennsylvania and New York with CBS News’ Anthony Salvanto</p><p>--<b>“Fox News Sunday”</b>: Donald Trump; Reince Priebus; roundtable: George Will, Julie Pace, Stephen Dinan and Charles Lane; “Power Player of the Week” with John Ficklin, the 10<sup>th</sup> member of his family to work in the White House</p><p>--<b>CNN’s “State of the Union” </b>(9am ET / 12pm ET): Reince Priebus; Bernie Sanders; roundtable: Bakari Sellers, Amanda Carpenter, Nina Turner and Andre Bauer</p><p>-<b>-Univision’s “Al Punto”</b> (SUN 10am ET / 1pm PT) Jorge Casta<b>&ntilde;</b>eda; Venezuelan opposition leader and wife of political prisoner (Leopoldo L<b>&oacute;</b>pez) Lilian Tintori; Republican political analysts Helen Aguirre and Adryana Boyne; FARC leader Rodrigo Londo<b>&ntilde;</b>o Echeverri “Timochenko”; Peruvian presidential candidate (Peruvians for Change) Pedro Pablo Kuczynski; Peruvian presidential candidate (Popular Action) Alfredo Barnechea (substitute anchor: Enrique Acevedo)</p><p>--<b>CNN’s “Inside Politics” with John King </b>(SUN 8am ET): Roundtable: Jackie Kucinich, Ed O’Keefe, Jeff Zeleny and Abby Phillip</p><p>--<b>Fox News’ “Sunday Morning Futures” </b>(10am ET / 9am CT): Peter King; Allianz chief economic adviser Mohamed El-Erian; actor Scott Baio; roundtable: WSJ’s Jon Hilsenrath, Ed Rollins and Scott Brown</p><p>--<b>CNN’s “Reliable Sources”</b>: (SUN 11am ET): Author Ariana Huffington (“The Sleep Revolution”); Connie Chung and Maury Povich; Matthew Dowd; former People Magazine managing editor Larry Hackett</p><p>--<b>Fox News’ “MediaBuzz” </b>(SUN 11am ET / 10am CT): Trump campaign spokesperson Katrina Pierson; Andrea Tantaros; Meghan McCain; Kennedy; Sandra Smith; Susan Ferrechio; Amy Holmes; Kirsten Powers; tech analyst Shana Glenzer</p><p>--<b>CNN’s “Fareed Zakaria GPS”</b>: (SUN 11am ET): National security roundtable: RAND’s Seth Jones, NYPD’s John Miller and former CIA and FBI counterterrorism analyst Phil Mudd; roundtable: E.J. Dionne, author and Republican Main Street Partnership’s Geoffrey Kabaservice (“The Downfall of Moderation and the Destruction of the Republican Party”), author and Princeton University’s Nell Irvin Painter (“Southern History Across the Color Line”) and David Remnick</p><b>--C-SPAN</b><p>: <b>“The Communicators”</b> (SAT 6:30pm ET): American Cable Association president &amp; CEO Matthew Polka and American Cable Association board chair Robert Gessner ... <b>“Newsmakers”</b> (SUN 10am ET): Club for Growth president David McIntosh, questioned by Washington Examiner’s Gabby Morrongiello and Politico’s James Hohmann ... <b>“Q&amp;A”</b> (SUN 8pm &amp; 11pm ET): Discussion with high school students attending the U.S. Senate Youth Program</p><p>--<b>MSNBC’s “PoliticsNation with Rev. Al Sharpton”</b>: (SUN 8-9am ET): Ben Carson; Clinton campaign national political director Amanda Renteria; Reuters’ Erin McPike; Amy Holmes; author Michael D’Antonio (“Never Enough”); One Wisconsin Now executive director Scot Ross</p><p>--<b>MSNBC’s “The Place for Politics”</b>: (SUN 9-10am ET): Capital Times’ Jessie Opoien; The Boston Globe’s James Pindell; Trump campaign senior advisor Barry Bennett; Politico’s Gabe Debenedetti (hosted by MSNBC’s Chris Jansing live from Wisconsin)</p><p>--<b>MSNBC’s “The Place for Politics”</b>: (SUN 10am-12pm ET): MSNBC’s Irin Carmon; Clinton campaign national spokeswoman Karen Finney; Michael Steele; Public Religion Research Institute’s Robert Jones; Michael Eric Dyson; Nina Turner; John Dean; Wisconsin-based radio host Charlie Sykes; The Nation Magazine’s John Nichols; One Wisconsin Now’s Scot Ross (hosted by MSNBC’s Joy Reid live from New York)</p><p>--<b>MSNBC’s “The Place for Politics”</b>: (SUN 12-2pm ET): Wisconsin-based radio host Charlie Sykes; Elise Jordan; Howard Dean; Kasich supporter Jack Voight; USA Today’s Paul Singer; RCP’s Rebecca Berg; State Rep. Melissa Sargent (D-WI); former Romney and Rubio advisor Avik Roy; author and Milwaukee Journal-Sentinel’s Jason Stein (“Dropping the Bomb: Scott Walker, Unions and the Fight for a State”) (hosted by MSNBC’s Alex Witt live from New York)</p><p>--<b>MSNBC’s “The Place for Politics”</b>: (SUN 3-4pm ET): The University of Wisconsin’s Mordecai Lee; former Wisconsin Lt. Gov. Barbara Lawton; WaPo’s Robert Costa; Milwaukee Mayor Tom Barrett (hosted by MSNBC’s Chris Jansing live from Wisconsin)</p><p>--<b>PBS’s “To the Contrary” with Bonnie Erb&eacute;</b>: Roundtable: Eleanor Holmes Norton; former Judge and federal prosecutor Debra Carnahan; Washington Examiner columnist Ashe Schow; Republican strategist Rina Shah Bharara and Concerned Women for America’s Penny Nance</p><p>--<b>SiriusXM’s “No Labels Radio” </b>(SAT 10am ET &amp; 6pm ET, SUN 1PM ET): Host Jon Huntsman, with co-hosts No Labels co-founder Stuart Holliday and Politico’s Daniel Lippman, will discuss the Puerto Rico debt crisis with Puerto Rico’s lead restructuring advisor Jim Millstein, the 2016 election cycle with pollster Frank Luntz and the Wisconsin primary with Ron Johnson. <a href=\"http://bit.ly/1qdGxOa\">http://bit.ly/1qdGxOa</a></p><b>Pic of the hosts this weekend</b><p>--<b>Sinclair’s “Full Measure” with Sharyl Attkisson</b> (SUN 9:30am ET on WJLA and airing on Sinclair stations nationwide): the dangerous underground world of smuggling, of both drugs and humans; correspondent Scott Thuman examines how Morocco, one of the most beautiful countries on Earth, is being associated with the most recent terrorist attacks. </p><b>** A message from the Embassy of the United Arab Emirates:</b><p> Together, the UAE and US are working to make the Middle East more secure and prevent the proliferation of nuclear weapons.</p><p>The UAE is the only country in the Arabian Gulf to have a civilian nuclear energy accord (also known as a 123 Agreement) with the US. Called the “gold standard” by nonproliferation experts and government leaders, the UAE has boldly pledged not to enrich uranium or extract plutonium.</p><p>Next year, the UAE’s first nuclear energy station will begin generating safe, clean electricity. A leader in developing new technologies for renewable and sustainable energy, the UAE is also home to the International Renewable Energy Agency and the Masdar Institute.</p><p>Learn about how the UAE and US are united for a better future: <a href=\"http://bit.ly/1WFzIyE\">bit.ly/1WFzIyE</a> **</p><b>SUBSCRIBE</b><p> to the Playbook family: <b>POLITICO Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdf6a1974229c1d9ebdd76fec80e3e0b9c4f389a6aebb15331\">http://politi.co/1M75UbX</a> ... <b>New York Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdd95a477c264b940d3d68e1bf06d895022c7331c2d59002ba\">http://politi.co/1ON8bqW</a> ... <b>Florida Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdf6adbc9a5e2ca9fb024caa7dd707c64d9f8d05282a7904f0\">http://politi.co/1JDm23W</a> ... <b>New Jersey Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfd43ea324572eabb35237a2c8ff38089b777628beacc2d4337\">http://politi.co/1HLKltF</a> ...<b> Massachusetts Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfd4109bd5463ef8db58d28d5151d70170c0d641b0aeacfb453\">http://politi.co/1Nhtq5v</a> ... <b>Illinois Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfd6d4bff376738b3cc9b795cde953f294cf858b75e52b1dffc\">http://politi.co/1N7u5sb</a> ... <b>California Playbook</b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdd58947052fda88f4ea11d09a07f7f5eb6c84873711479543\">http://politi.co/1N8zdJU</a> ... <b>Brussels Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdc6d26c38b244d39857441567733e7f3775cbd8ffde507bef\">http://politi.co/1FZeLcw</a></p>",
  "extract_length" : 34299,
  "extract_checksum" : "wwIxgMbCy1ndAg4z22wIU0FCl3k",
  "summary_text" : "Dr. Jud Feldman (hat tip: MBF) ... Brent Colburn ... Meridith Webster, director of comms. and public affairs at Bloomberg and an Obama West Wing alum -- fun fact: Favorite color is orange! (h/ts Ben Chang) ... Politico’s Dana Rubinstein and Josefa Velasquez … Lynda Tran, 270 Strategies founding partner and CBS News political contributor (h/t Heather Purcell and Eleis Brennan) ... Brian Austin, consultant at Kaiser Associates (h/t wife Emily Stephenson) ... Emily Steel, a TV and media reporter at the NYT and a WSJ and FT alum ... Carl Kasell, formerly of NPR (whose voice is on HIS answering machine?) … Robby Zirkelbach, SVP of comms at PhRMA and AHIP alum, mediocre golfer, avid Hawkeyes fan and great friend (h/t Joe Brettell) ... Dan Sallick, partner and co-founder of Subject Matter (h/t Peter Cherukuri) ...\n\n",
  "title" : "BERNIE GETS UNDER HILLARY’S SKIN -- MANAFOT RISING: Trump’s new guru builds empire, including veterans of GOP’s last contested convention – B’DAYS: Brent Colburn, Meridith Webster",
  "published" : "2016-04-02T17:41:31Z",  
  "publisher" : "POLITICO",
  "description" : "Inside the campaigns and Corey Lewandowski's shrinking role.",
  "links" : [ "http://amzn.to/1RS7lfM", "http://amzn.to/1g8Rlak", "http://amzn.to/1pUgx9P", "http://api.addthis.com/oexchange/0.8/forward/facebook/offer?pco=tbx32nj-1.0&url=http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545&pubid=politico.com", "http://api.addthis.com/oexchange/0.8/forward/googleplus/offer?pco=tbxnj-1.0&url=http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545&pubid=politico.com&title=BERNIE+GETS+UNDER+HILLARY%E2%80%99S+SKIN+--+MANAFOT+RISING%3A+Trump%E2%80%99s+new+guru+builds+empire%2C+including+veterans+of+GOP%E2%80%99s+last+contested+convention+%E2%80%93+B%E2%80%99DAYS%3A+Brent+Colburn%2C+Meridith+Webster", "http://api.addthis.com/oexchange/0.8/forward/twitter/offer?pco=tbx32nj-1.0&url=http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545&pubid=politico.com&text=BERNIE+GETS+UNDER+HILLARY%E2%80%99S+SKIN+--+MANAFOT+RISING%3A+Trump%E2%80%99s+new+guru+builds+empire%2C+including+veterans+of+GOP%E2%80%99s+last+contested+convention+%E2%80%93+B%E2%80%99DAYS%3A+Brent+Colburn%2C+Meridith+Webster", "http://apne.ws/1RS7xLX", "http://apne.ws/1pUb1Uo", "http://bit.ly/1M1RDiU", "http://bit.ly/1M7kPVN", "http://bit.ly/1MHUeyx", "http://bit.ly/1MHUqhg", "http://bit.ly/1RRUhds", "http://bit.ly/1RRUw8l", "http://bit.ly/1RuAL4x", "http://bit.ly/1SGsWYC", "http://bit.ly/1SuSlSs", "http://bit.ly/1UZby4C", "http://bit.ly/1UyiRAk", "http://bit.ly/1WFzIyE", "http://bit.ly/1Y4E0jF", "http://bit.ly/1Y4FDh8", "http://bit.ly/1pUoSKC", "http://bit.ly/1qdGxOa", "http://bit.ly/1qkEUOn", "http://bit.ly/21XVFdO", "http://bit.ly/21ZpB9r", "http://bit.ly/25D0qOL", "http://bloom.bg/1RRUqxm", "http://cbsn.ws/1Y6qImE", "http://cnn.it/1SGvhCL", "http://cnn.it/1SsWF4w", "http://for.tn/1VjJ7gB", "http://go.politicoemail.com/?qs=b48cc63abe2b3bfd4109bd5463ef8db58d28d5151d70170c0d641b0aeacfb453", "http://go.politicoemail.com/?qs=b48cc63abe2b3bfd43ea324572eabb35237a2c8ff38089b777628beacc2d4337", "http://go.politicoemail.com/?qs=b48cc63abe2b3bfd6d4bff376738b3cc9b795cde953f294cf858b75e52b1dffc", "http://go.politicoemail.com/?qs=b48cc63abe2b3bfdc6d26c38b244d39857441567733e7f3775cbd8ffde507bef", "http://go.politicoemail.com/?qs=b48cc63abe2b3bfdd58947052fda88f4ea11d09a07f7f5eb6c84873711479543", "http://go.politicoemail.com/?qs=b48cc63abe2b3bfdd95a477c264b940d3d68e1bf06d895022c7331c2d59002ba", "http://go.politicoemail.com/?qs=b48cc63abe2b3bfdf6a1974229c1d9ebdd76fec80e3e0b9c4f389a6aebb15331", "http://go.politicoemail.com/?qs=b48cc63abe2b3bfdf6adbc9a5e2ca9fb024caa7dd707c64d9f8d05282a7904f0", "http://nyti.ms/1RS6BqQ", "http://nyti.ms/1ZTJn6E", "http://politi.co/1LYiY5C", "http://politi.co/1M75UbX", "http://politi.co/1RRWYvm", "http://politi.co/1RRWsxo", "http://politi.co/1UJ0n0k", "http://politi.co/22WoUUd", "http://politi.co/22YORlU", "http://politi.co/25Bwouu", "http://www.autismspeaks.org", "http://www.politico.com/magazine/story/2016/03/2016-election-defense-military-industry-contractors-donations-money-contributions-presidential-hillary-clinton-bernie-sanders-republican-ted-cruz-213783", "http://www.politico.com/magazine/story/2016/03/donald-trump-2016-terrorist-attack-foreign-policy-213784", "http://www.politico.com/magazine/story/2016/03/doug-sosnik-memo-2016-is-over-213753", "http://www.politico.com/magazine/story/2016/04/donald-trump-ted-cruz-2016-911-muslims-tom-ridge-213785", "http://www.politico.com/magazine/story/2016/04/the-next-donald-trumps-213786", "http://www.politico.com/playbook", "http://www.politico.com/playbook/2016/03/great-mentioner-trumps-cabinet-jill-abramsons-new-column-david-gregorys-tv-gig-bday-peter-cherukuri-paul-farhi-robert-gibbs-steve-peoples-roger-simon-peter-velz-213455", "http://www.politico.com/playbook/2016/03/how-will-bill-take-trump-attacks-on-hillary-playbook-breakfast-on-the-court-tomorrow-white-house-counsel-neil-eggleston-senior-adviser-brian-deese-remembering-jennifer-frey-213503", "http://www.politico.com/playbook/2016/03/trump-to-people-im-the-highest-level-of-smart-4-hours-of-sleep-on-working-out-dont-have-to-when-youre-making-america-great-again-you-get-a-lot-of-exercise-213479", "http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545", "http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545#", "http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545#superComments", "http://www.politico.com/playbook/2016/04/trump-would-be-most-unpopular-major-party-nominee-in-32-years-playbook-breakfast-with-white-house-counsel-neil-eggleston-and-senior-adviser-brian-deese-livestreams-7-57-am-213524", "http://www.politico.com/story/2016/03/ted-cruz-christian-evangelical-vote-221349", "http://www.politico.com/story/2016/04/donald-trump-corey-lewandowski-shrinking-role-campaign-221487", "http://www.politico.com/story/2016/04/donald-trump-delegates-north-dakota-gop-221480", "http://www.politico.com/story/2016/04/donald-trump-tennessee-gop-delegates-221489", "http://www.politico.com/story/2016/04/hillary-clinton-bernie-sanders-attacks-221484", "http://www.politico.com/story/2016/04/hillary-clinton-fbi-strategy-emails-221435", "http://www.politico.com/story/2016/04/obama-donald-trump-presser-221486", "http://www.politico.com/tipsheets/playbook/archive", "http://www.politico.com/tipsheets/the-2016-blast/2016/04/cruzs-evangelical-problem-democrats-lying-spat-clinton-aides-all-four-one-kasich-doesnt-like-lyin-213544", "http://yhoo.it/1qb6FsM" ],
  "author_name" : "Kristen East",
  "author_link" : "http://www.politico.com/staff/east-kristen",
  "author_gender" : "FEMALE",
  "image_src" : "http://static.politico.com/97/de/e15842c94e928ce8b561e20745a0/whitelogoondots.jpeg",
  "card" : "SUMMARY_LARGE_IMAGE",
  "type" : "POST",
  "sentiment" : "POSITIVE",
  "lang" : "en",
  "categories" : {
    "business" : 3.6933053736507154E-5,
    "entertainment" : 0.003606990311947353,
    "health" : 3.339112750008408E-8,
    "politics" : 0.9963494407896508,
    "science" : 2.1030360920429964E-11,
    "sports" : 6.512794233646671E-6,
    "technology" : 8.963827337022493E-8
  },
  "duplicates" : {
    "1459618662000015199" : 1.0
  },
  "duplicates_count" : 1,
  "metadata_score" : 124
} 

Once authentication headers are provided, you can run normal Elasticsearch queries via our REST API.

You can read more about authentication but all the following examples include proper authentication (using your credentials) if you have an account and are logged in.

You can simply just copy/paste examples into your tools to get started.

We provide a near-raw Elasticsearch endpoint. All Elasticsearch queries can be specified (except for calls which can be destructive or index new documents).
The best way to get more information about queries is to read the Elasticsearch documentation.

Spinn3r search is available via REST and JSON. The JSON results use the same document structure/schema that our firehose uses which provides easy integration between both APIs.

User Interface vs API

Search results are made available via web based user interface as well as a REST-based JSON API.

Spinn3r is primarily an API-oriented service so once in production you will simply be calling our APIs and fetching data.

However, we do ship a user interface which you can use to interact with the data and get a better understanding of the API.

Here’s a screenshot of a search within mainstream news:

Filters

Example query which uses a filter to constrain the results:

{
    "size": 10,
    "query": {
        "query_string" : {
            "query" : "main:Firefox"
        }
    },
    "filter": {
      "query" : {
          "query_string" : {
              "query" : "(lang:en OR lang:es) AND (source_publisher_type:WEBLOG OR source_publisher_type:MAINSTREAM_NEWS)"
          }
      }
    }
}

You can also add filters to these results to constrain the records we search over.

Note that we have two nested “query string” queries here.

Technically we can merge them into the first query but a key feature of filters is that they are cached and VERY fast once calculated.

This means you can keep re-using this filter and your subsequent queries will be faster because we can use the existing filter and just search over documents in that filter.

Filters are updated on the fly when new documents are updated so you don’t have to worry about recalculating them.

Query String Query

{
  "query": {
    "query_string" : {
        "query" : "source_publisher_type:MAINSTREAM_NEWS AND main:Ebola"
    }
  }  
}

Elasticsearch has a “query string query” syntax that’s very handy for working with queries and doesn’t require expressing the query in full JSON and can make things much easier.

You can read more about it here

You can construct a query string query in JSON:

This includes two fields using a boolean AND expression.

Most fields in our schema can be used to express queries using a combination of AND and OR clauses.

Some additional of queries examples would include:

query explanation
source_publisher_type:WEBLOG only return content from weblogs
source_publisher_type:WEBLOG AND lang:en only return weblogs in english (using our language classifier)
source_publisher_type:WEBLOG OR source_publisher_type:MAINSTREAM_NEWS weblogs or mainstream news
tags:linux tagged 'linux’ (either hashtags or user specified tags).
main:linux contains word linux in the 'main’ field.
domain:cnn.com content from the domain cnn.com

Time filters

Absolute time filters:

{
  "query": {
    "query_string": {
      "query": "source_publisher_type:MAINSTREAM_NEWS"
    }
  },
  "filter": {
    "range": {
      "date_found": {
        "gte": "2016-10-03T10:25:15Z",
        "lte": "2016-10-03T10:55:15Z",
        "format": "date_time_no_millis"
      }
    }
  }
}

Relative time filters:

{
  "query": {
    "query_string": {
      "query": "source_publisher_type:MAINSTREAM_NEWS"
    }
  },
  "filter": {
    "range": {
      "date_found": {
        "gte": "now-1h"
      }
    }
  }
}

It’s also possible to limit the query by time.

It’s best to use ISO8601 time strings in filters which avoid problems with timezones and are easier to read.

Relative time

You can also sort by relative time.

This will give you all posts from mainstream news between the above dates.

Sorting by custom fields

{
  "query": {
    "query_string": {
      "query": "source_publisher_type:MAINSTREAM_NEWS"
    }
  },
  "sort" : [ {
    "published" : {
      "order" : "desc"
    }
  } ]
}

You can also sort by custom fields. Here’s an example of searching by the published field descending.

Sorting by custom fields requires more CPU than normal so we would suggest only doing so when necessary.

Aggregations

Spinn3r provides support for Elasticsearch aggregations on top of our core data platform:

For example, one could search for “fishing” and get back the top 100 posts over the last 60 days. However, it would be nice to get a report of the volume of these posts, per hour, over the same time period.

This can be done with aggregations wherein you specify a modified query which returns buckets, one per hour, with the number of posts matching your query within that hour.

If you use too many aggregations excessive memory can be used. We only support a maximum of 2GB per node per query. However, we have more than 100 servers so in practice this will be more than 200GB of RAM which is more than enough for most purposes.

Location based queries

Specific geo fields:

{
        ... 

        "geo_location" : "Lorain, OH",
        "geo_location_id" : "91d57ea9ae3b0bbd",
        "geo_featurename" : "PPL",
        "geo_point" : "41.45282,-82.18237",
        "geo_name_id" : "5161262",
        "geo_name" : "Lorain, Lorain County, Ohio, United States",
        "geo_country" : "US",
        "geo_state" : "Ohio",
        "geo_city" : "Lorain"

        ...
}

Search for all documents in the United States:

{ 
  "size": 1,
  "query": {
      "term" : { "geo_country" : "US" }
  }
}

Our geocoding system extracts the location based information to denormalized fields

Note that all country codes are ISO “alpha-2” country codes.

We have a full list of country codes available here

Searching for specific countries

You can search for specific countries and states by combining fields.

For example: geo_country:US AND geo_state:"New York"

Would find all posts in the United States and in New York.

We have a full list of states available here

Geo_point queries

{
    "size": 1,
    "query": {
    "geo_bounding_box": { 
      "geo_point": {
        "top_left": {
          "lat": 42,
          "lon": -72
        },
        "bottom_right": {
          "lat": 40,
          "lon": -74
        }
      }
    }
  }
}

The field geo_point accept latitude-longitude pairs integrating geometrical operations, which can be used for queries like this:

For more information on geo_point based queries available here

Categories

We support categories in our search which are backed by our classifier API which puts content into categories like politics, entertainment, health, etc.

You can search for these by running queries for:

query description
categories.business:>0.85 Content with the category business and a greater than 85% probability

The values are between 0.0 and 1.0 and all of them sum to 1.0.

One strategy is to pick a main category (say 'politics’) and then search for that.

The secondary categories (when sorted by rank) can also be used to determine what flavor of political content you’re indexing.

For example if the politics score is 0.9 and the technology score is 0.1 then you know this is a political story about technology.

Hot / warm / cold architecture

We use a hot/warm/cold architecture for storing content long term to maximize both performance and content density.

alias description
content_* Access content in the last 30 days on ultra-fast SSD with 25% of is cached in memory
warm_content_* Access an additional 6 months of content on a larger cluster of HDD machines
cold_content_* Access an additional 3 months of content on higher density HDD machines

Hot content

Alias: content_*

Initially (the first 30 days) of content sits on our hot architecture.

This is based on SSD (solid state drives) and about 25% of the data is cached in memory. Searches within the hot window will execute very fast. Usually within 1 second per query.

The index alias for this is just content_* so all searches using that alias will be very fast.

This should probably be your production alias as queries in this range will be very fast.

Warm content

Alias: warm_content_*

Content under warm content aliases is stored on a separate hardware profile using hard disks. These are designed for bulk storage which means that query times for this content will be higher.

The cluster storing warm has a large number of machines so is much faster than the cold content.

Cold content

Alias: cold_content_*

This content is also stored on HDD but we use higher density drives and more drives per server. This means the content is a bit slower than warm content but allows us to index a LOT more data.

Using multiple aliases to index more data.

You can search over ALL indexes if you want to search the entire content range at once. This is accomplished by adding a comma between the indexes.

Alias: cold_content_*,warm_content_*,content_*

Document clustering

{
  "size" : 10,
  "query" : {
    "filtered" : {
      "query" : {
        "mlt" : {
          "fields" : [ "main" ],
          "like_text" : "US military conducted airstrike against Ansar Al-Sharia killing 8 commanders ... ",
          "min_term_freq" : 1,
          "max_query_terms" : 250
        }
      },
      "filter" : {
        "and" : {
          "filters" : [ {
            "range" : {
              "date_found" : {
                "gte": "2015-07-21T10:25:15Z",
                "lte": "2016-07-21T10:55:15Z",
                "format": "date_time_no_millis",
                "include_lower" : true,
                "include_upper" : true
              }
            }
          }, {
            "terms" : {
              "source_publisher_type" : [ "MICROBLOG" ]
            }
          } ]
        }
      }
    }
  },
  "fields" : [ "main", "date_found" ]
}

Elasticsearch supports finding documents similar to other documents.

This can be used for basic document clustering. If you need more advanced document clustering you should consider using something like k-means or support vector machines. However, this is a very quick and easy way to get basic clustering and related documents.

This is an advanced “more like this” (MLT) query query showing how to identify similar documents, within a time range, and for a specific publisher type.

Fetching specific URLs

{
     "query" : {
        "match" : {
            "permalink" : "http://www.voiceofsandiego.org/events/member-coffee-4/"
        }
     }
}

You can also search for the content for a specific URL. For example:

Cacheable filters for faster searches

Uncacheable filter (bad)

{
    "range": {
        "date_found": {
            "gte": "2015-12-13T20:43:01Z",
            "lte": "now"
        }
    }
}

By default Elasticsearch will try to cache filters.. If you’re using a filter like like the one on the right.

It won’t be cached because the timestamp will continue to change if you calculate an exact timestamp per query.

Cacheable filter (good)

{
    "range": {
        "date_found": {
            "gte": "2015-12-13T20:00:00Z",
            "lte": "now"
        }
    }

The revised filter is better and Elasticsearch will cache (and update as it changes) the filter over time and your searches will be much faster since the filter doesn’t have to be rebuilt each time.

Avoid Expect headers with curl

Bypass curl expect header:

-H Expect:

If you’re using the command line app curl to work with search you may have problems with large queries.

Curl will add an Expect header which doesn’t work with our API auth layer.

You can bypass it by adding

To the command line. Otherwise your large queries won’t work when debugging via curl and they will just timeout.

There’s no downside here to add this other and doesn’t modify the query or HTTP data sent.

Search on exact fields.

It’s best to search on exact fields and avoid using the main field if possible.

For example, you could search for hashtags by searching for:

Search for hashtags: main:#linux

or you could search over the tags field:

Search for tags field: tags:linux

Additionally this will work for other tag specifications other than hashtags.

You can do the same thing with author mentions. For example a query for:

Mention search: main:@barackobama

can be rewritten to:

Mention field search: mentions:barackobama

Searching from Python

#!/usr/bin/env python

import requests
import json

####
VENDOR="{{vendor}}"
VENDOR_AUTH="{{vendor_auth}}"

####
# Define the query you want to run.

QUERY="""
{
    "size": 1500,
    "query": {
        "query_string" : {
            "query" : "main:Obama"
        }
    }
}
"""

NR_PAGES=100

###
# generic function to just write data to disk
def handle_data(page, response):

    file_name="%04d.json" % page

    print "Writing JSON data to: %s" % file_name

    f=open( file_name, "w" );
    f.write( response.content )
    f.close()

###
# Perform the first request.  The URL needs to be slightly different because
# we have to specify the index name here.

url='http://%s.elasticsearch.spinn3r.com/content*/_search?scroll=5m&pretty=true' % VENDOR

print "Fetching from %s" % url
print "Running query: "
print QUERY

## we have to add our vendor code information to the request now.
headers = { 'X-vendor': VENDOR,
            'X-vendor-auth': VENDOR_AUTH }

response = requests.post( url, headers=headers, data=QUERY )

####
# now that we have the first result we have to parse in the scroll ID. The first
# page is LITERALLY just the scroll ID.

data=json.loads(response.content)

scroll_id = data["_scroll_id"]

handle_data(0,response)

print "Query took: %sms" % data["took"]
print "Total hits: %s" % data["hits"]["total"]

for page in xrange( 1, NR_PAGES):

    url='http://%s.elasticsearch.spinn3r.com/_search/scroll?scroll=5m&pretty=true' % VENDOR

    response = requests.post( url, headers=headers, data=scroll_id )

    handle_data(page,response)

    scroll_id = response.json()["_scroll_id"]

Both curl and the search UI are great for getting up and running quickly.

For more advanced usage you might want to use a language like Python.

Here’s a quick Python script for running queries, paging through the results, and writing them to disk.

Your vendor code has been updated below so you can just use the script as-is.

This example uses the scroll API to fetch the documents page by page and writes them to individual files with 1500 records each.

Searching from Java

JestClientFactory factory = new JestClientFactory();

factory.setHttpClientConfig(new HttpClientConfig
                              .Builder("http://{{vendor}}.elasticsearch.spinn3r.com")
                              .multiThreaded(true)
                              .build());

JestClient client = factory.getObject();

String query = "{\n" +
                 "    \"size\": 1,\n" +
                 "    \"query\": {\n" +
                 "        \"query_string\" : {\n" +
                 "            \"query\" : \"main:Firefox\"\n" +
                 "        }\n" +
                 "    }  \n" +
                 "}";

Search search = new Search.Builder(query)
                  // multiple index or types can be added.
                  .addIndex("content_*")
                  .addType("content")
                  .setHeader( "X-vendor", "{{vendor}}" )
                  .setHeader( "X-vendor-auth", "{{vendor_auth}}" )
                  .build();

SearchResult result = client.execute(search);

System.out.printf( "%s\n", result.getTotal() );

When working within Java we recommend using Jest for performing REST / JSON calls against our search API.

Here’s an example using the Jest API for making calls into Spinn3r. Make sure to include the authentication headers below so that your requests are accepted.

Classifier

The Classifier API allows developers to analyze text and news articles for semantic meaning and to classify text into hashtags, stock ticker symbols, or any other classifications we’ve created.

The classifier works by taking a large set of training examples with labels generated per classification, then using a linear classifier to build a model which enables us to mathematically label future input text.

We provide pre-trained classifiers for public use based on public data sources which are classified by social media users.

Selecting Classifiers

Classifiers are identified by selectors driven by name/value tag pairs.

Right now we have two standard tags we use to identify classifiers:

name description
textType Type of text we’re classifying. NEWS is the only supported type at the moment. This allows us to specify which form of text a classifier was trained with.
labelType Specifies the type of labels to output. HASHTAG is the only supported type at the moment. This allows us to specify the type of the labels. For example, labels could be wikipedia pages, hashtags, stock ticker symbols, etc.

Currently Supported Classifiers

lang textType labelType description
en NEWS HASHTAG Classifies new articles into the hashtag labels

Analyze Text

The analyze text API allows a developer to analyze raw text to compute labels based on the underlying text structure.

For example, you could give us a news article about politics and we would give you back labels like BarackObama, MittRomney, etc.

Endpoint

Use URL endpoint: http://{{vendor}}.classifier.spinn3r.com/v1/classifier/analyze-text

Requests are JSON and must have the Content-Type: application/json HTTP header.

Requests

Example request:

{ 
    "classifiers": [ {

        "selector": {
            "textType": "NEWS",
            "labelType": "HASHTAG"
        }

    } ],
    "lang": "en",
    "text": "(CNN)President Barack Obama's approval rating stands at 55% in a new CNN/ORC poll, the highest mark of his second term ... "

}

Requests are simply JSON messages specifying the language, text, and classifier we should use.

name optional description
classifiers no A JSON array describing selectors for classifiers we wish to use to train the given text
lang yes language of the content. We recommend including this especially for short content. If you don’t specify it we will just use our language classifier to detect the language
text no The text you would like to classify

Classifier

A JSON object explaining which classifier you would like to use to classify your text.

Right now we support a selector which contains two tags textType and labelType specifying which type of classifier to use to classify the text.

name optional description
selector no A JSON object specifying which classifier to use for classification

Multiple classifiers can be specified if necessary.

Selector

Contains a map of tag to value pairs used to select the classifier. These must specify at least textType and labelType to select the right classifier.

Responses

{
  "lang" : "en",
  "classifications" : [
    {
      "classifier" : {
        "selector" : {
          "textType" : "NEWS",
          "labelType" : "HASHTAG"
        }
      },
      "labels" : [
        {
          "label" : "DNCleak",
          "probability" : 0.18919783089447492
        },
        {
          "label" : "politics",
          "probability" : 0.12080862285793965
        },
        ...
      ]
    }
  ]
}

We return the lang and the list of classifications of the input text as well as the list of labels per classifier.

name optional description
lang no The language code that was used to classify the text
classifications no The list of classifications for with metadata about each label

Labels

An individual label includes the following columns:

name optional description
label no The label assigned to the category
probability no The probability that the document belongs in the given label.

Note that all the labels MAY NOT sum to 1.0. We only include the primary labels. You can assume that the remaining categories sum to 1.0.

For example, if the classifier has 4 total labels, and we return two labels of foo=0.6 and bar=0.1 this will sum to 0.7 and we can assume the remaining labels sum to 0.3.

The analyze link API functions exactly like the the analyze-text API but we allow the developer to specify a link / URL instead of text.

We then fetch the URL, perform content extract (chrome removal) on the URL so that only the article text is included, and then perform classification.

Endpoint

Use URL endpoint: http://{{vendor}}.classifier.spinn3r.com/v1/classifier/analyze-link

Requests are JSON and must have the Content-Type: application/json HTTP header.

Request

Example request:

{ 
     "classifiers": [ {

         "selector": {
             "textType": "NEWS",
             "labelType": "HASHTAG"
         }

     } ],
     "lang": "en",
     "link": "http://www.cnn.com/2016/10/06/politics/obama-approval-rating-new-high/",
     "textStrategy": "EXTRACT"

 }

The request is almost identical to analyze-text however we use the following fields.

name optional description
classifier no JSON object describing the classifier we should use
lang yes language of the content. We recommend including this especially for short content. If you don’t specify it we will just use our language classifier to detect the language
link no A link to a URL you would like to classify
textStrategy yes The strategy used to compute the text to perform the classification. We support FULL to have the full text of the page used as well as EXTRACT which removes sidebar content and other page noise

Hive

SELECT tmp2.tag, COUNT(tmp2.tag) AS ranking FROM (
    SELECT tmp1.author_handle, EXPLODE(tmp1.tags) AS tag FROM (
        SELECT tmp.author_handle, COLLECT_SET(tmp.tag) AS tags FROM (
            SELECT author_handle, EXPLODE(tags) AS tag FROM content WHERE
                 lang='en' AND
                 source_publisher_type='MICROBLOG' AND
                 source_followers >= 10000 AND
                 date_found > DATE_SUB(CURRENT_TIMESTAMP(), 30)
        ) AS tmp GROUP BY tmp.author_handle
    ) AS tmp1
) AS tmp2 GROUP BY tmp2.tag ORDER BY ranking DESC LIMIT 200000

Spinn3r has support for Apache Hive/Spark queries on top of our massive datastore.

We have over 50TB of content in our search index spread over 8 months.

This is a massive amount of data which can be used to build extremely powerful applications.

Unfortunately, access to this kind of data is not cheap. However, we can now host large batch jobs and provide static data exports - making it much more affordable for our customers/

All of this is powered via Apache Spark and Hive meaning that you can represent powerful exports in simple SQL format in an Open Source platform you’re already familiar with.

This is a recent SQL export we ran for a customer which exported top users and resulted in a 5TB static export.

At the moment we haven’t exposed a direct API for this functionality.

Right now Spark/Hive require a lot of per-job specific settings to run and we want something that is easier for our customers.

What we’re currently doing is allowing our customers to execute the SQL and then we provide a static tar.gz download for them.

Please contact support if you would like to execute a Hive query.

Parser

Basic request:

 curl -XPOST 'http://{{vendor}}.rest.spinn3r.com/v1/parser/parse' --header "X-vendor: {{vendor}}" --header "X-vendor-auth: {{vendor_auth}}" -d '{
     "link": "http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/",
     "publisherType": "WEBLOG"
 }
 '

Example output:

{
  "parser" : "com.spinn3r.artemis.robot.metadata.instrumented.InstrumentedParser",
  "request" : {
    "link" : "http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/",
    "publisherType" : "WEBLOG",
    "parseStrategy" : "PERMALINK"
  },
  "content" : [
{
  "shortlink" : "http://wp.me/p1FaB8-56hH",
  "canonical" : "http://social.techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/",
  "main" : "<article> \n <div>  \n  <div> \n   <div>     \n    <div>  \n     <div> \n      <div> \n       <div>  \n        <img src=\"https://tctechcrunch2011.files.wordpress.com/2015/09/2042117809_4bf153cf11_b.jpg?w=738\" /> \n        <p>Today, Google’s CEO Sundar Pichai <a href=\"http://googleblog.blogspot.com/2015/09/bringing-the-internet-to-more-indians.html\">shared details on a new plan</a> to bring more Indian residents online. He notes that there’s still over a billion of them in his native country that aren’t connected.</p> \n        <p>The key? India’s train system. And a plan to bring Wi-Fi to its 10 million rail passengers a day. And it’s free (to start). Pichai shared Google’s plans, while sharing his own story about his days using Chennai Central station to get to <a href=\"http://www.iitkgp.ac.in/\">school</a>.</p> \n        <blockquote>\n         <p>We’d like to help get these next billion Indians online—so they can access the entire web, and all of its information and opportunity. And not just with any old connection—with fast broadband so they can experience the best of the web. That’s why, today, on the occasion of Indian Prime Minister Narendra Modi’s visit to our U.S. headquarters, and in line with his Digital India initiative, we announced a new project to provide high-speed public Wi-Fi in 400 train stations across India.</p>\n        </blockquote> \n        <p>All of the big tech companies have been getting a visit from Indian Prime Minister Narendra Modi, with Facebook being <a href=\"https://techcrunch.com/2015/09/27/modiberg/\">one of them</a>. Each company seems to have its own ideas on how to expand Internet availability and Google’s is definitely unique.</p> \n        <p><img src=\"https://tctechcrunch2011.files.wordpress.com/2015/09/modi-sundar-alt-twitter.jpg?w=640&amp;h=474\" alt=\"modi-sundar alt twitter\" width=\"640\" height=\"474\" /></p> \n        <p>Here’s a map of the first 100 train stations that will get Wi-Fi by the end of 2016:</p>\n        <div>\n         <div></div>\n        </div> \n        <p><img src=\"https://tctechcrunch2011.files.wordpress.com/2015/09/indiawifi_zoomed-our.jpg?w=640&amp;h=399\" alt=\"IndiaWifi_zoomed our\" width=\"640\" height=\"399\" /></p> \n        <p>Google will be working with <a href=\"http://www.indianrailways.gov.in/\">Indian Railways</a> and RailTel on the initiative.</p> \n        <p><img src=\"https://tctechcrunch2011.files.wordpress.com/2015/09/sundar_pichai_cropped.jpg?w=221&amp;h=300\" alt=\"Sundar_Pichai_(cropped)\" width=\"221\" height=\"300\" />Pichai outlined why just 100 stations will speed up the process of getting more of India’s residents on the Internet:</p> \n        <blockquote>\n         <p>Even with just the first 100 stations online, this project will make Wi-Fi available for the more than 10 million people who pass through every day. This will rank it as the largest public Wi-Fi project in India, and among the largest in the world, by number of potential users. It will also be fast—many times faster than what most people in India have access to today, allowing travelers to stream a high definition video while they’re waiting, research their destination, or download some videos, a book or a new game for the journey ahead. Best of all, the service will be free to start, with the long-term goal of making it self-sustainable to allow for expansion to more stations and other places, with RailTel and more partners, in the future.</p>\n        </blockquote> \n        <p>This is the first big initiative for Sundar Pichai as <a href=\"https://techcrunch.com/2015/08/10/meet-alphabet-googles-new-corporate-boss-as-sundar-pichai-takes-over-the-search-company/\">the CEO of Google</a>, which will also be holding a <a href=\"https://techcrunch.com/2015/09/18/you-only-have-one-shot/\">major hardware event this week</a>. He noted “It’s my hope that this Wi-Fi project will make all these things a little easier.” This initiative, along with others like the <a href=\"https://techcrunch.com/2014/06/25/android-one/\">Android One</a> project should help the next generations of the residents of India get — and stay — online.</p>  \n        <small>Featured Image: <a href=\"https://www.flickr.com/photos/superk550i/2042117809/in/photolist-47sotK-7spnaC-podmnG-gzu6y7-gzu4cR-kQHiMB-8ZUVna-6dMB1r-47snyZ-47wtxQ-47wtCG-47sjFM-ekdEGQ-7adwcX-osi4Np-a3y2iZ-7skxR6-67Lz1x-a3AUXy-9gsZjb-bHdyuP-nWnzdd-gztXDY-oPYzci-gztFed-r2VZYK-gztjRj-s1BxN9-prqnaT-ag9oJ4-8NcLVe-8MRTHi-7Biiee-ag9h1K-6dMBPn-prkeHA-dUFjzm-fydF5p-b3BnjZ-bHdy6K-6dRJZJ-dsjdpp-buiLFy-oLMegv-apTuag-nRccft-nueVin-ecovbQ-6oonig-bVMAE7\">superk550i</a>/<a href=\"https://www.flickr.com/\">Flickr</a> UNDER A <a href=\"http://creativecommons.org/licenses/by/2.0/\">CC BY 2.0</a> LICENSE</small> \n       </div> \n       <div> \n        <ul> \n         <li> <h5>0</h5><br /> <small>SHARES</small> </li> \n         <li> <a href=\"http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#\" rel=\"external\"></a> </li> \n         <li> <a href=\"http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#\" rel=\"external\"></a> </li> \n         <li> <a href=\"http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#\" rel=\"external\"></a> </li> \n         <li> <a href=\"http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#\" rel=\"external\"></a> </li> \n         <li> <a href=\"http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#\" rel=\"external\"></a> </li> \n         <li> <a href=\"http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#\" rel=\"external\"></a> </li> \n         <li> <a href=\"http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#\" rel=\"external\"></a> </li> \n         <li> <a href=\"http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#\" rel=\"external\"></a> </li> \n        </ul> \n       </div> \n       <div></div> \n       <div></div> \n       <div> \n        <small> <a href=\"https://techcrunch.com/advertise/\" title=\"Advertise on TechCrunch\"> Advertisement </a> </small> \n        <div></div> \n        <div></div> \n       </div> \n      </div> \n     </div>   \n     <div> \n      <div> \n       <small> <a href=\"https://techcrunch.com/advertise/\" title=\"Advertise on TechCrunch\"> Advertisement </a> </small>  \n       <div></div>   \n      </div>  \n      <div> \n       <h3>CrunchBase</h3> \n       <div> \n        <ul>  \n         <li> <h4> <a href=\"https://crunchbase.com/organization/google/\"> Google </a> </h4> \n          <div> \n           <ul> \n            <li> <strong>Founded</strong> 1998 </li> \n            <li> <strong>Overview</strong> Google is a multinational corporation that is specialized in internet-related services and products. The company’s product portfolio includes Google Search, which provides&nbsp;users with access to information online; Knowledge Graph that allows users to search for things, people, or places as well as builds systems recognizing speech and understanding natural language; Google Now, which provides information … </li> \n            <li> <strong>Location</strong>  <a href=\"http://crunchbase.com/location/mountain-view/37bfe551197e02269a805d7bec8a50dd\">Mountain View, CA</a>  </li> \n            <li> <strong>Categories</strong>  <a href=\"http://crunchbase.com/category/search/0d39287e5b1377a6970274294daded43\">Search</a>, <a href=\"http://crunchbase.com/category/email/687e34ef30e2af08a178436d6b53d633\">Email</a>, <a href=\"http://crunchbase.com/category/blogging-platforms/1ded70549e6d2ede57adc95e83be0c06\">Blogging Platforms</a>, <a href=\"http://crunchbase.com/category/information-technology/dbca89faf0835438b4add3fdeceb78e7\">Information Technology</a>, <a href=\"http://crunchbase.com/category/video-streaming/b1b3b2d785ed2cb1fc603e2b6a3b5ddd\">Video Streaming</a>, <a href=\"http://crunchbase.com/category/software/c08b5441a05b9777b7a6012728caddd9\">Software</a>  </li> \n            <li> <strong>Website</strong>  <a href=\"http://www.google.com/\">http://www.google.com/</a>  </li> \n            <li> <a href=\"https://crunchbase.com/organization/google\">Full profile for Google</a> </li> \n           </ul> \n          </div> </li>   \n         <li> <h4> <a href=\"https://crunchbase.com/person/sundar-pichai/\"> Sundar Pichai </a> </h4> \n          <div> \n           <ul> \n            <li> <strong>Bio</strong> Sundar Pichai joined Google in 2004 and became CEO in August of 2015. Prior to that, he led the product management and innovation efforts for a suite of Google's consumer products, including Google Toolbar, Chrome and Chrome OS. He was also responsible for the HTML5 and open web platform efforts at Google. Before joining Google, he held various engineering and product management positions at Applied … </li> \n            <li> <a href=\"https://crunchbase.com/person/sundar-pichai\">Full profile for Sundar Pichai</a> </li> \n           </ul> \n          </div> </li>  \n        </ul> \n       </div> \n      </div>  \n      <div> \n       <h2> Newsletter Subscriptions </h2> \n       <div> \n        <div>  \n         <div>     \n          <strong>The Daily Crunch</strong> Get the top tech stories of the day delivered to your inbox       \n          <strong>TC Weekly Roundup</strong> Get a weekly recap of the biggest tech stories       \n          <strong>CrunchBase Daily</strong> The latest startup funding announcements    \n         </div>  Enter Address  Subscribe   \n        </div> \n       </div> \n      </div>  \n      <div> \n       <div></div> \n      </div> \n     </div>  \n    </div>  \n   </div> \n  </div>   \n  <div>  \n   <div> \n    <ul> \n     <li> \n      <div> \n       <a href=\"https://techcrunch.com/tag/india/\"> india </a> \n      </div> </li> \n     <li> \n      <div> \n       <a href=\"https://techcrunch.com/tag/sundar-pichai/\"> Sundar Pichai </a> \n      </div> </li> \n     <li> \n      <div> \n       <a href=\"https://techcrunch.com/topic/company/google/\"> Google </a> \n      </div> </li> \n     <li> \n      <div> \n       <a href=\"https://techcrunch.com/transportation/\"> Transportation </a> \n      </div> </li> \n     <li> \n      <div>\n       <a>Popular Posts</a>\n      </div> \n      <div> \n       <ul> \n        <div></div>  \n       </ul> \n      </div> </li> \n    </ul> \n   </div>  \n  </div>  \n </div> \n</article>",
  "main_length" : 10445,
  "main_checksum" : "YjV73cp7OlxVhcej6HxKmT8-6CQ",
  "main_format" : "HTML",
  "extract" : "<h1>Google Announces Plan To Put Wi-Fi In 400 Train Stations Across India</h1><div>\n  Posted \n</div><h4>Medvedev To Hold A Google Hangout On Russia’s Tech&nbsp;Future</h4><p>Today, Google’s CEO Sundar Pichai <a href=\"http://googleblog.blogspot.com/2015/09/bringing-the-internet-to-more-indians.html\">shared details on a new plan</a> to bring more Indian residents online. He notes that there’s still over a billion of them in his native country that aren’t connected.</p><p>The key? India’s train system. And a plan to bring Wi-Fi to its 10 million rail passengers a day. And it’s free (to start). Pichai shared Google’s plans, while sharing his own story about his days using Chennai Central station to get to <a href=\"http://www.iitkgp.ac.in/\">school</a>.</p><p>We’d like to help get these next billion Indians online—so they can access the entire web, and all of its information and opportunity. And not just with any old connection—with fast broadband so they can experience the best of the web. That’s why, today, on the occasion of Indian Prime Minister Narendra Modi’s visit to our U.S. headquarters, and in line with his Digital India initiative, we announced a new project to provide high-speed public Wi-Fi in 400 train stations across India.</p><p>All of the big tech companies have been getting a visit from Indian Prime Minister Narendra Modi, with Facebook being <a href=\"https://techcrunch.com/2015/09/27/modiberg/\">one of them</a>. Each company seems to have its own ideas on how to expand Internet availability and Google’s is definitely unique.</p><p>Here’s a map of the first 100 train stations that will get Wi-Fi by the end of 2016:</p><p>Google will be working with <a href=\"http://www.indianrailways.gov.in/\">Indian Railways</a> and RailTel on the initiative.</p><p>Pichai outlined why just 100 stations will speed up the process of getting more of India’s residents on the Internet:</p><p>Even with just the first 100 stations online, this project will make Wi-Fi available for the more than 10 million people who pass through every day. This will rank it as the largest public Wi-Fi project in India, and among the largest in the world, by number of potential users. It will also be fast—many times faster than what most people in India have access to today, allowing travelers to stream a high definition video while they’re waiting, research their destination, or download some videos, a book or a new game for the journey ahead. Best of all, the service will be free to start, with the long-term goal of making it self-sustainable to allow for expansion to more stations and other places, with RailTel and more partners, in the future.</p><p>This is the first big initiative for Sundar Pichai as <a href=\"https://techcrunch.com/2015/08/10/meet-alphabet-googles-new-corporate-boss-as-sundar-pichai-takes-over-the-search-company/\">the CEO of Google</a>, which will also be holding a <a href=\"https://techcrunch.com/2015/09/18/you-only-have-one-shot/\">major hardware event this week</a>. He noted “It’s my hope that this Wi-Fi project will make all these things a little easier.” This initiative, along with others like the <a href=\"https://techcrunch.com/2014/06/25/android-one/\">Android One</a> project should help the next generations of the residents of India get — and stay — online.</p>",
  "extract_length" : 3317,
  "extract_checksum" : "JnZAbJDNEyf_5JEHTmPgJnN39qc",
  "summary_text" : "Pichai outlined why just 100 stations will speed up the process of getting more of India’s residents on the Internet:\n\nEven with just the first 100 stations online, this project will make Wi-Fi available for the more than 10 million people who pass through every day. This will rank it as the largest public Wi-Fi project in India, and among the largest in the world, by number of potential users. It will also be fast—many times faster than what most people in India have access to today, allowing travelers to stream a high definition video while they’re waiting, research their destination, or download some videos, a book or a new game for the journey ahead. Best of all, the service will be free to start, with the long-term goal of making it self-sustainable to allow for expansion to more stations and other places, with RailTel and more partners, in the future.\n\n",
  "title" : "Google Announces Plan To Put Wi-Fi In 400 Train Stations Across India",
  "publisher" : "TechCrunch",
  "description" : "Today, Google’s CEO Sundar Pichai shared details on a new plan to bring more Indian residents online. He notes that there’s still over a billion of them in his native country that…",
  "links" : [ "http://creativecommons.org/licenses/by/2.0/", "http://crunchbase.com/category/blogging-platforms/1ded70549e6d2ede57adc95e83be0c06", "http://crunchbase.com/category/email/687e34ef30e2af08a178436d6b53d633", "http://crunchbase.com/category/information-technology/dbca89faf0835438b4add3fdeceb78e7", "http://crunchbase.com/category/search/0d39287e5b1377a6970274294daded43", "http://crunchbase.com/category/software/c08b5441a05b9777b7a6012728caddd9", "http://crunchbase.com/category/video-streaming/b1b3b2d785ed2cb1fc603e2b6a3b5ddd", "http://crunchbase.com/location/mountain-view/37bfe551197e02269a805d7bec8a50dd", "http://googleblog.blogspot.com/2015/09/bringing-the-internet-to-more-indians.html", "http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#", "http://www.google.com/", "http://www.iitkgp.ac.in/", "http://www.indianrailways.gov.in/", "https://crunchbase.com/organization/google", "https://crunchbase.com/organization/google/", "https://crunchbase.com/person/sundar-pichai", "https://crunchbase.com/person/sundar-pichai/", "https://techcrunch.com/2014/06/25/android-one/", "https://techcrunch.com/2015/08/10/meet-alphabet-googles-new-corporate-boss-as-sundar-pichai-takes-over-the-search-company/", "https://techcrunch.com/2015/09/18/you-only-have-one-shot/", "https://techcrunch.com/2015/09/27/modiberg/", "https://techcrunch.com/advertise/", "https://techcrunch.com/tag/india/", "https://techcrunch.com/tag/sundar-pichai/", "https://techcrunch.com/topic/company/google/", "https://techcrunch.com/transportation/", "https://www.flickr.com/", "https://www.flickr.com/photos/superk550i/2042117809/in/photolist-47sotK-7spnaC-podmnG-gzu6y7-gzu4cR-kQHiMB-8ZUVna-6dMB1r-47snyZ-47wtxQ-47wtCG-47sjFM-ekdEGQ-7adwcX-osi4Np-a3y2iZ-7skxR6-67Lz1x-a3AUXy-9gsZjb-bHdyuP-nWnzdd-gztXDY-oPYzci-gztFed-r2VZYK-gztjRj-s1BxN9-prqnaT-ag9oJ4-8NcLVe-8MRTHi-7Biiee-ag9h1K-6dMBPn-prkeHA-dUFjzm-fydF5p-b3BnjZ-bHdy6K-6dRJZJ-dsjdpp-buiLFy-oLMegv-apTuag-nRccft-nueVin-ecovbQ-6oonig-bVMAE7" ],
  "published" : "2015-09-27T19:32:27Z",
  "author_name" : "Drew Olanoff",
  "author_link" : "http://techcrunch.com/author/drew-olanoff/",
  "author_gender" : "MALE",
  "image_src" : "https://tctechcrunch2011.files.wordpress.com/2015/09/2042117809_4bf153cf11_b.jpg?w=764&h=400&crop=1",
  "card" : "SUMMARY_LARGE_IMAGE",
  "type" : "POST",
  "sentiment" : "POSITIVE",
  "lang" : "en",
  "categories" : {
    "business" : 6.697868530957982E-6,
    "entertainment" : 1.0460869636330763E-7,
    "health" : 4.692317650333774E-8,
    "politics" : 6.811966674982687E-10,
    "science" : 4.314704857471789E-9,
    "sports" : 3.8005285377463125E-9,
    "technology" : 0.9999931418031668
  },
  "metadata_score" : 334
}
 ]
}

The Spinn3r Parser API provides a granular way to request documents and get back parsed metadata around a specific permalink, news article, or blog post.

This provides API access to content on a more granular basis. If a URL is not indexed yet, or its older, or something we might not index; this allows you to still fetch the content and work with it using our content schema and machine learning infrastructure.

When we index the content we perform the following operations:

Requests

Request POST example:

{
    "link" : "http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/",
    "publisherType" : "WEBLOG",
    "parseStrategy" : "PERMALINK"
  }

Use URL endpoint: http://{{vendor}}.rest.spinn3r.com/v1/parser/parse

Requests are JSON and must have the Content-Type: application/json HTTP header.

POST is a simple JSON structure with two basic fields:

name description
link The URL to fetch and perform metadata extraction
publisherType A Spinn3r publisher type. Should be either WEBLOG or MAINSTREAM_NEWS

Responses

Responses contain a list of objects in the Spinn3r content schema.

name description
request Contains metadata about the request that generated this response
content A list of objects conforming to the Spinn3r content schema

Firehose

To get started, first download the latest version of the client

The Firehose client provides 99% of the heavy lifting or connecting to Spinn3r and fetching content in real time.

Quick start

We detail the required steps below but for quick overview:

Requirements

Download

To download the latest release please visit http://public.maven.spinn3r.com/dist/

We provide both debian packages and tar.gz distributions.

Debian / Ubuntu

We have .debs which can be easily installed on Debian and Ubuntu. You will need a Java 8 release in the PATH but there are no other significant requirements.

For .deb packages, this will allow you to startup the daemon (after you’ve provisioned your client) to setup as a regular unix daemon. IE:

/etc/init.d/spinn3r-artemis-client-fetcher start      

tar.gz binaries

If you’re on Redhat, Solaris, or another OS, we provide binary packages that you can use which are packages in tar.gz format.

For tar.gz, once uncompressed you will see a filesystem layout like the following:

drwxr-xr-x    5 burton          170 Sep  6 16:13 .
drwxr-xr-x    3 burton          102 Sep  6 16:13 ./etc
drwxr-xr-x    3 burton          102 Sep  6 16:13 ./etc/init.d
-rwxr-xr-x    1 burton         2824 Sep  6 16:10 ./etc/init.d/spinn3r-artemis-client-fetcher
-rwxr-xr-x    1 burton         2824 Sep  6 16:10 ./install.sh
-rw-r--r--    1 burton          259 Aug 22 16:58 ./README.md
drwxr-xr-x    3 burton          102 Sep  6 16:13 ./usr
drwxr-xr-x    3 burton          102 Sep  6 16:13 ./usr/share
drwxr-xr-x    3 burton          102 Sep  6 16:13 ./usr/share/spinn3r-artemis-client-fetcher
drwxr-xr-x   15 burton          510 Sep  6 16:13 ./usr/share/spinn3r-artemis-client-fetcher/lib
-rw-r--r--    1 burton         6260 Sep  6 16:09 ./usr/share/spinn3r-artemis-client-fetcher/lib/artemis-http-lib-5.1.21.jar
-rw-r--r--    1 burton         8833 Sep  6 16:09 ./usr/share/spinn3r-artemis-client-fetcher/lib/artemis-schema-core-5.1.21.jar
-rw-r--r--    1 burton        34065 Sep  6 16:09 ./usr/share/spinn3r-artemis-client-fetcher/lib/artemis-util-5.1.21.jar
-rw-r--r--    1 burton       185140 Jun 18 20:50 ./usr/share/spinn3r-artemis-client-fetcher/lib/commons-io-2.4.jar
-rw-r--r--    1 burton      2172168 Jun 30 20:15 ./usr/share/spinn3r-artemis-client-fetcher/lib/guava-15.0.jar
-rw-r--r--    1 burton        35058 Jun 16 22:57 ./usr/share/spinn3r-artemis-client-fetcher/lib/jackson-annotations-2.3.0.jar
-rw-r--r--    1 burton       197981 Jun 16 22:57 ./usr/share/spinn3r-artemis-client-fetcher/lib/jackson-core-2.3.0.jar
-rw-r--r--    1 burton       914028 Jun 16 22:57 ./usr/share/spinn3r-artemis-client-fetcher/lib/jackson-databind-2.3.0.jar
-rw-r--r--    1 burton       481535 Jun 16 22:57 ./usr/share/spinn3r-artemis-client-fetcher/lib/log4j-1.2.16.jar
-rw-r--r--    1 burton        26768 Sep  6 16:10 ./usr/share/spinn3r-artemis-client-fetcher/lib/spinn3r-artemis-client-api-5.1.21.jar
-rw-r--r--    1 burton         3457 Sep  6 16:10 ./usr/share/spinn3r-artemis-client-fetcher/lib/spinn3r-artemis-client-core-5.1.21.jar
-rw-r--r--    1 burton        26821 Sep  6 16:10 ./usr/share/spinn3r-artemis-client-fetcher/lib/spinn3r-artemis-client-fetcher-5.1.21.jar
-rw-r--r--    1 burton        14740 Sep  6 16:10 ./usr/share/spinn3r-artemis-client-fetcher/lib/spinn3r-artemis-client-lib-5.1.21.jar

At this point you can just run install.sh which will place the files into the right directories in your OS.
You can then move on to provisioning to create a new firehose connection.

Usage

The client first needs to be provisioned. The provisioning step defines various parameters needed for indexing content and then creates the directories on the filesystem needed to run the client.

There are then two main daemons that need to be started. One fetches content from the server and spools it to disk (the fetcher) . The other watches for new content on disk and parses it and imports the content into your database.

Provision

Provision command:

java -cp "/usr/share/spinn3r-artemis-client-fetcher/lib/*" com.spinn3r.artemis.client.fetcher.Provisioner \
     --dir=/var/spool/spinn3r-artemis-client/default \
     --vendor={{vendor}} \
     --after=-1hour \
     --processPolicy=DELETE \
     --fetchListenerClassName=com.spinn3r.artemis.client.watcher.LoggingFetcherListener

You first need to provision a new client in a given directory. The client will write a resume.checkpoint file so that you can start/stop the client and it will automatically resume.

NOTE: Make sure the quotes are included in the Java classpath or the command won’t run due to bash file name expansion.

Arguments

The following arguments must be specified:

dir

The directory which will contain your spool and where the client will save your files.

Use


As your main/default spool directory.

You can create additional directories for custom filters or back filing data if 
you fall behind.

#### after

The 'after' parameter accepts both absolute time (if specified in ISO8106) as well
as relative time.

 Relative time is in the format:

-1hour -2hours -1day +1hour +1day “`

The - (negative) prefix is used to denote time in the past.

The + (positive) prefix is used to denote time in the future.

If you would like to start from the current moment in time you could specify: +0minutes

value meaning
+0minutes Start indexing from now
-1hours Start indexing one hour in the past
-2hours Start indexing two hours in the past
-30minutes Start indexing 30 minutes in the past

processPolicy

Specify what the Watcher daemon should do with files once they have been processed.

This only applies when you’re using Java.

name description
DELETE Delete the files when they are processed. The advantage of deleting files is that it’s easy to free up disk space. The disadvantage is that you can’t re-process them if something goes wrong.
MOVE_TO_PROCESSED Move the files to a ‘processed’ directory. You then need to have a cron script or some other process manage purging files once you think they’re ok to delete. This has the advantage of being able to re-import files but the disadvantage of potentially running out of disk space.

filter

If you’d like to run with a client-side filter you can add a --filter option when you provision the client and the filter will be applied locally.

We support both client and server side filters and the server side filter will be used with both the client side filter for all requests.

The server-side filter is deployed with your Spinn3r account based on what data you’re purchasing.

See the filtering documentation on how to setup a filter with and some examples of using the language.

If you have trouble writing one please just contact support and we’ll write one for you.

Spool directory contents

After if you 'ls’ the directory you will see:

root@my-host:/usr/share/spinn3r-artemis-client# ls -al /var/spool/spinn3r-artemis-client/default
drwxr-xr-x 8 root root 4096 Sep  6 21:04 .
drwxr-xr-x 3 root root 4096 Sep  6 21:04 ..
drwxr-xr-x 2 root root 4096 Sep  6 21:04 data
drwxr-xr-x 2 root root 4096 Sep  6 21:04 dead
drwxr-xr-x 2 root root 4096 Sep  6 21:04 lib
drwxr-xr-x 2 root root 4096 Sep  6 21:04 logs
drwxr-xr-x 2 root root 4096 Sep  6 21:04 processed
-rw-r--r-- 1 root root  298 Sep  6 21:04 resume.checkpoint
drwxr-xr-x 2 root root 4096 Sep  6 21:04 tmp

The data directory stores all the json files that the client fetches.

The logs directory contains the logs of the client running in the background.

Run the Fetcher

Now just run the client.

/etc/init.d/spinn3r-artemis-client-fetcher start

This will startup the fetcher and handle the spool directory you just created

Indexing

After fetching content from Spinn3r, files are spooled to disk which allows you to parse them and import the data into your database.

If you’re using Python or another language you can parse the files directly.

File format

Example header:

// START-META
// request-URL: http://api.artemis.spinn3r.com/api/v1/content.stream...
// END-META

All files are formatted as JSON with UTF8 encoding.

The only exception to standardized JSON are that files have a header that may not be handled by some libraries since this is not part of the JSON spec.

This is easy to strip if your JSON parser doesn’t support comments. Just strip until // END-META.

This is provided so that you can re-download the URL should you find that it is corrupt.

Designing your parser

Here are some rules to design a parser that handles changes to our API moving forward.

Potential Issues and Warnings

Best Practices

In order for us to provide production-level and high quality service to our customers, and for Spinn3r to meet our SLA, we require our customers to meet our best practices for using our APIs.

These ensure that service levels are maintained and that Spinn3r can provide high quality service.

Additionally, we’ve found common anti-patterns in customer configuration and setups that we strongly want to avoid.

Required version upgrades

VERY rarely we may notify customers that a version upgrade is required.

In nearly ten years of business we’ve done this exactly once.

Once a customer has been notified that a mandatory version upgrade is required they MUST upgrade to this version in order to meet our SLA.

Minimum required bandwidth

A Spinn3r bandwidth test much be done periodically (every 60 days at least) to ensure at least 2x throughput is available to our servers.

In practice this means you need about 30Mbit to fetch data.

The additional bandwidth is required for resume. Without resume support if your client goes offline it won’t be able to ever catch up if we don’t have more capacity than required.

Spool directory on NTFS or HDFS

Due to added latencies on IO, we require customers to write directly to a regular HDD or SSD drive connected to a local computer.

We understand that many of our customers want to work with their own database, filesystem, or queueing technology, but these have consistently added extra latencies and proven to slow down our firehose client.

These can be used in production easily by moving files into your database, filesystem, etc as soon as they are written to disk.

HTTP proxies

All HTTP proxies must be standard RFC compliant proxies.

They MAY cache documents but MUST only cache documents with proper HTTP caching headers. They MUST NOT cache documents based on their own policies.

They MUST NOT block requests which are legitimate.

Examples of broken proxy configurations include malware, content censorship proxies that detect patterns of content and block them.

DNS

All DNS servers must be standard RFC DNS servers.

They MAY cache DNS requests but only according to proper DNS caching TTLs.

They MUST NOT create arbitrary caching polices.

Specifically, we use low DNS cache TTLs and your DNS recursors must respect this or it may lead to an outage.

Intrusion Detection Systems

We strongly recommend that intrusion detections systems (IDS) be disabled when using Spinn3r or that they at least only warn on issues, not disable them.

Because of the fact that Spinn3r uses a large number of HTTP requests, especially in in the Firehose API, an IDS can block Spinn3r and take break production applications.

This is especially true if it’s performing any type of content analysis as some Internet content may be controversial and flagged as being inappropriate by the IDS thus breaking Spinn3r.

Architecture

The client works by reading new data via HTTP, encoded as JSON, and writes the data to a local directory you specify via the command line.

Your application just needs to watch the directory for new files, parse them, and import them into your database.

Custom clients not supported

Custom clients that connect to Spinn3r are NOT supported and we consider it an anti-pattern that will eventually break. Unfortunately, there are far too many issues with implementing HTTP correctly that will eventually cause custom clients to fail: HTTP connect and read timeout, gzip encoding, DNS load balancing and caching, resume, HTTP headers and encoding all amount to a very complicated implementation which we don’t to break for our customers. Further, this allows us to push out clients with new features without asking our customers to implement them.

Protocol Overview

The raw protocol works by fetching content from a given date range.

Firehose architecture

However, we hide all of this from you and provide you with a simple daemon you just run on your servers.

Message Integrity

All JSON messages represented, as HTTP responses, are given a SHA256 checksum during the response which the client then verifies.

With high volume throughput it’s possible (but rare) that messages could become corrupt during transfer and this way we verify that nothing is corrupted during transfer.

Filtering

We support an easy to use filtering language that can be used to mine data from the stream of events that we publish.

The language is designed for filter messages as they pass through the stream.

Language

First, see the content schema definition for documentation on all fields.

The language is very similar to Java/python conditional statements and is modeled after XPath.

Boolean logic and grouping

Arbitrary boolean logic and grouping is supported:

foo = 'foo' and ( cat = 'cat' or dog = 'dog' )

Contains

The contains() function can be used to see if a field contains a substring. It can also to be used for set membership.

contains( title , 'Linux' )

Filter for a specific language code

This will filter out all objects except those with this exact title.

lang = 'en'

Filtering a specific source

Filter all source except Techcrunch:

source_resource = 'http://techcrunch.com'

Filtering one publisher types

You can filter for one specific publisher type:

source_publisher_type = 'MAINSTREAM_NEWS'

Filtering for more than one publisher type

If you would like more than one publisher type in the stream, you can specify it with a logical 'or’:

source_publisher_type='MAINSTREAM_NEWS' or source_publisher_type='WEBLOG'

Logging and Monitoring

Both the Fetcher and Watcher have support for logging messages via log4j.

Standard input / output

The standard input and output of the daemon in written to /var/log/{{daemon_name}}

The fetcher writes to:

/var/log/spinn3r-artemis-client-fetcher

These may have critical exceptions if the daemons fail to startup properly.

Runtime logs

Once running, all logs are written to the logs directory under your spool dir.

We log both the progress of the client as well as any HTTP requests being performed, their URLs, as well as any exceptions being thrown.

Monitoring

Both daemons open a port using HTTP which can be used to monitor the state of the daemon.

For security reasons, the daemon only binds to localhost (127.0.0.1) so that these ports are not accessible to the world.

Ports

The fetcher runs on port 20400

URLs to monitor

Each daemon has the following endpoints:

/threads

Dumps all threads and their associated stack trace. Can be used to detect a dead fetcher and probably only useful when requested by Spinn3r support.

/ping

Returns with the phrase 'pong’. Useful for monitoring that the process is live.

Checkpoints

Here’s an example resume.checkpoint file:

{
  "after" : 1410109500000000000,
  "before" : 9223372036854775807,
  "vendor" : "{{vendor}}",
  "comment" : "Spinn3r checkpoint file.  You MUST shut down the fetcher and watcher before editing (and create a backup first).",
  "processPolicy" : "DELETE",
  "fetchListenerClassName" : "com.spinn3r.artemis.client.watcher.LoggingFetcherListener"
}

Periodically the fetcher will write a new checkpoint file to disk keeping track of its progress. This way when it’s restarted it doesn’t have to download all identical content again.

The file can be edited but you MUST stop the fetcher first.

This will allow you to change various config directives including the process policy, etc.

Proxy server

First, make sure you absolutely need to run a HTTP proxy. It’s an additional piece of middleware and could cause a problem interacting with Spinn3r.

HTTP proxies are a somewhat stable piece of technology so for most uses it should work just fine.

If you would like to use the fetcher with a HTTP server you need to perform the following steps.

Update init.d 'default’ file

Edit the file:

/etc/default/spinn3r-artemis-client-fetcher

This file is loaded from the daemon startup script and allows you to add extra command line arguments to your daemon.

you will need to add the line:

export JAVA_OPTS="-Dhttp.proxyHost=localhost -Dhttp.proxyPort=8080"

The host and port will need to be changed to the host and port in your environment.

Then just restart the daemon. The daemon will then be using the HTTP port.

Verify that it’s working

Then restart the daemon by running:

/etc/init.d/spinn3r-artemis-client-fetcher restart

On startup you will see the following message if it correctly loaded the default file:

Loading init.d default file: /etc/default/spinn3r-artemis-client-fetcher

Then you should run ps aux and look for the daemon and verify that the above options were added to the daemon command line.

At that point you should be using a proxy server with the fetcher and all requests will go through the proxy.

Data Directory

Some users require the data directory to be placed outside of /var. This can be used for various reasons including usage on different volumes etc.

This can be done via editing the daemon defaults file

Changing the defaults file

Create a file named

/etc/default/spinn3r-artemis-client-fetcher

Then add:

export EXTRA_OPTS="--basedir=/path/to/new/basedir"

This will tell the daemon to read spool directories from this base directory.

You should have a default spool directory provisioned here which is your main connection to the Spinn3r data.

Firehose archives

By default all firehose customers have access to the last 5 days worth of content.

It’s your responsibility to keep a client up and running and listening to data.

Do not run a client via crontab. Instead. Keep it running in the background as a daemon.

If you require access to data older than 5 days you can contact us to purchase access to that data via full-text search and fetch all the content in the time window you require.

Standalone mode

The Spinn3r client can be installed in a standalone mode where it doesn’t have to be run as a daemon and instead can be run by command line.

This is not the preferred method to install the client and is only really a good idea if you’re running in a non-standard or custom environment or on an OS that supports Java but might not necessarily be Linux/Unix (Windows or Mac OSX).

Provisioning

java -cp "./spinn3r-artemis-client-fetcher/lib/*" com.spinn3r.artemis.client.fetcher.Provisioner \
     --dir=./spool/spinn3r-artemis-client/default \
     --vendor={{vendor}} \
     --after=-1hour \
     --processPolicy=DELETE \
     --fetchListenerClassName=com.spinn3r.artemis.client.watcher.LoggingFetcherListener

The provision commands are all the same. The main difference is that you’re going to want to specify a directory other than the default.

(please see the provisioning documentation for more information on how to provision the client)

This assumes that the spinn3r-artemis-client-fetcher directory is in your current directory and that your spool should be in your current directory as well.

These can be placed anywhere you like. We would advise AGAINST placing this on a network mounted share. If it fails at ANY future point it means your Spinn3r client will also fail. The best strategy here is to write local and then have a background find command copy the files to your network share (probably via crontab).

Starting the fetcher

java -cp "./spinn3r-artemis-client-fetcher/lib/*" com.spinn3r.artemis.client.fetcher.Fetcher --basedir=./spool/spinn3r-artemis-client

This will output the following:

Loading log4j config from: jar:file:/Users/burton/test/spinn3r-artemis-client-fetcher/lib/spinn3r-artemis-client-fetcher-5.1.292.jar!/config/log4j.xml
Starting HTTP server on port 20400 (useLocalhost=true)...
For more information about Spinn3r please see:

  http://spinn3r.com

And for instructions on how to use the client please see:
                   1
  https://github.com/burtonator/spinn3r-artemis-client-example

Now all we need to do is run a fetcher which points to our newly provisioned directory.

The daemon will start in the foreground and block. It won’t return control to the terminal. You will either have to place it in screen or some sort of job control system to keep it running.

Common Problems

There are many common problems that our customers run into that we want to highlight:

Speed test.

If you’re having any problems with the client locking up or lagging make sure to run a speed test

Nine times out of ten the problem is bandwidth. Even if you think you have enough bandwidth, make sure to run a speed test.

WAN links, traffic shaping, proxies, and firewalls can sometimes catch our client as we push a lot of bandwidth through the API.

MICROBLOG and text vs HTML.

Microblog content comes through the API as text. This may seem like just HTML but we have a main_format element which is a symbol for either TEXT or HTML which you can use to differentiate between the two.

POSTs vs COMMENTs

Each piece of content has a type associated with it. Make sure to process this correctly as you may wish to handle comments and posts in different manners.

Migration

If you’re a Spinn3r 4.0 customer, and are migrating an existing client, you may need to migrate on a field by field basis to the new Spinn3r 5.0 JSON format.

The full JSON mapping is kept in our content schema.

The following table will help you map from existing fields to new fields:

Mapping

Protostream

Protostream files are generated by Spinn3r 4.0. They were our preferred file format but we may have some older customers using XML files. The mapping from XML to protostream is located in the Spinn3r 4.0 wiki:

Spinn3r 5.0 JSON

The Spinn3r 5.0 JSON format is designed to be concise and human readable as well as have reasonable parsing performance.

Fields may have a prefix. For example, source_title is the title for source object.

For the most part, there’s a 1:1 mapping from protostream to Spinn3r 5.0 JSON.

In some cases, there are additional fields for 5.0, for example, we now support partial dates (dates without timezones, or seconds).

protostream JSON
permalink.title title
permalink.hashcode hashcode
permalink.author.name author_name
permalink.author.email n/a. This was almost never included due to people protecting their identity and email from spammers.
permalink.author.link.href author_link
permalink.spam_probability n/a. will be added in the future. Seldom used.
permalink.last_published last_published
permalink.date_found date_found
permalink.link.href link
permalink.link.resource resource
permalink.lang.code lang
permalink.content.data html
permalink.content_extract.data main
source.link.href source_link
source.link.resource source_resource
source.title source_title
source.description source_description
source.hashcode source_hashcode
source.lang.code n/a for now. This field wasn’t extensively used in 4.0. However, each piece of content has it’s own language code though. We may add this in the future as a histograph of the language posted to the source from historical content.
source.lang.probability n/a for now.
source.date_found source_date_found
source.resource_status n/a
source.last_posted source_last_posted
source.tier n/a for now. This wasn’t a widely used field.
source.publisher_type source_publisher_type

Schema

content schema

A basic example of data in JSON format.

{
  "bucket" : 0,
  "resource" : "http://cnn.com/2014/10/15/health/texas-ebola-outbreak",
  "date_found" : "2014-06-22T01:08:52Z",
  "index_method" : "PERMALINK_TASK",
  "html" : "<html><body><p>Full HTML of the content</p></body></html>",
  "html_length" : 57,
  "source_hashcode" : "COH0cFU4G1sMlRHd9gEvS-n3FFI",
  "source_resource" : "http://cnn.com",
  "source_link" : "http://cnn.com",
  "source_publisher_type" : "MAINSTREAM_NEWS",
  "source_publisher_subtype" : "MAINSTREAM_NEWS",
  "source_date_found" : "2014-06-22T01:08:52Z",
  "source_update_interval" : 900000,
  "source_setting_update_strategy" : "CYCLICAL",
  "source_setting_index_strategy" : "DEFAULT",
  "source_title" : "CNN",
  "permalink" : "http://www.cnn.com/2014/10/15/health/texas-ebola-outbreak/index.html",
  "canonical" : "http://www.cnn.com/2014/10/15/health/texas-ebola-outbreak/index.html",
  "main" : "<p>Full HTML of the content</p>",
  "main_length" : 31,
  "main_checksum" : "I7QyvW_g9AjGg3vWjmcxwo7wjXs",
  "main_format" : "HTML",
  "summary_text" : "Another nurse who contracted Ebola after caring for a man who died of the virus was on a flight from Cleveland to Dallas.",
  "title" : "CDC: Nurse with Ebola should not have traveled",
  "publisher" : "CNN",
  "section" : "Technology",
  "description" : "Another nurse who contracted Ebola after caring for a man who died of the virus was on a flight from Cleveland to Dallas.",
  "tags" : [ "nurse", "outbreak", "ebola" ],
  "published" : "2014-06-22T01:08:52Z",
  "author_name" : "Holly Yan",
  "author_link" : "https://twitter.com/HollyYanCNN",
  "lang" : "en"
}

Stores content in our index including the full HTML as well as the metadata we were able to extract. Some of these fields are HTML which are cleaned of any unsafe elements which might cause cross scripting attacks or other vulnerabilities are removed. Additionally, All URLs are fully expanded. Encoding is UTF-8. We may reference external objects, such as the Source metadata which is then fully denormalized with a source_ prefix.

Member Name Type Description
bucket bigint The bucket to write this content (timestamp prefix and suffix valued from 0-99). This allows us to use the random partitioner and still get decent parallel client read performance.
sequence bigint The time our robot saw the post and wrote it to the database. This is a sequence timestamp and supports distributed write. This can be used as an external primary key as it’s gauranteed to always be unique. The value is opaque and not designed to be readable by humans and the format can change at any time.
sequence_range bigint The sequence as a range of values between 0 and 999,999 (sequence % 100000). This allows you to filter values by range to accept just a sample of values.
hashcode ascii base64filesafe(sha1(resource)) … Essentially the base 64 (filesafe) encoding of the sha1 of the tokenized permalink/url
resource text Tokenized form of the permalink.
date_found timestamp The time we fetched and added this content to our index.
index_method enum The method that we used to discovery and index the content. We have various algorithms to discover content and this lets the algorithm tag the content.
detection_method enum The method we used to detect this URL was new and recently published.
html text The HTML content of this permalink as fetched by our robot. Note that this is RAW content. No cleanup is done. Javascript is present, etc. If you want to work with this content you must make sure to clean/sanitize it yourself. See the ‘body’ field for a clean version of the document. In some situations it’s possible to not have any html. An example is when we’re using an API or firehose where the original full-html isn’t present or not would just be wasteful.
html_length varint The length of the HTML
html_checksum text The SHA1 checksum of the HTML.
html_blob blob zlib compressed HTML content from our crawler. Used for legacy customers who need full HTML.
html_blob_length varint The length of the HTML
html_blob_checksum text The SHA1 checksum of the HTML.
extract_blob blob ${member.description}
version ascii The version of Spinn3r used to write this content.
last_updated timestamp The last time we updated the metadata on this content. On initial record creation last_updated and date_found will be identical but last_updated will change over time as we update metadata.
source_hashcode ascii base64filesafe(sha1(resource)) of the source. Essentially the base 64 (filesafe) encoding of the sha1 of the tokenized permalink/url of the source.
source_resource text The tokenized URL for this source.
source_link text The non-tokenized URL for this source. Use this URL if you would like to fetch this source via HTTP.
source_publisher_type enum The publisher type (mainstream news, weblog, forum, etc) of this source encoded as an int.
source_publisher_subtype text A string representing the publisher sub type which is more specific than the publisher type. The publisher subtype is usually the name of the social network hosting the content.
source_date_found timestamp The time we added this source to our index. This is the time we found the source not when it was created.
source_last_updated timestamp The last time our crawler visited the source and processed it with a task. This is always incremented even if the site isn’t updated or even if the site is HTTP 500 or other network/transient errors. This may not be updated if we aren’t fetching the source via HTTP.
source_last_published timestamp The last time this source published a new HTML file (as measured by content_sha1). This may not be updated if we aren’t fetching the source via HTTP.
source_last_posted timestamp The last time this source posted a new piece of content
source_update_interval varint The number of milliseconds between updates to re-fetch this source. This is used to for cyclical updates of sources and usually depends on how often the source posts updates.
source_http_status varint The HTTP status code of the last request to this source.
source_spam_probability float The probability, between 0 and 1, that this source is a spam source. -1.0 if we have not yet classified it.
source_content_length varint The length, in bytes, of this HTML from the last time we fetched the page.
source_content_checksum text The SHA1 checksum of the content.
source_assigned_tags set<text> The set of tags assigned to this source by the either customers or spinn3r (globally). This is used so that your client can filter by assigned tags or search by them as well. This is not to be confused with the tags field which are assigned by the site. These tags are opaque strings and not human readable to avoid giving away any customer information in the API. Any sources you manually register are assigned tags with your vendor auth code. This will allow you to register sources, and then filter / search over them.
source_setting_update_strategy enum The update strategy for computing the update interval.
source_setting_index_strategy enum The update stratey for computing the update interval.
source_setting_author_policy enum Policy on handling author metadata.
source_pshb_hub text The PSHB hub this source is using.
source_pshb_topic text The PSHB topic this source is using.
source_pshb_last_posted timestamp The last time this source posted and sent a message to the PSHB endpoint.
source_pshb_lease_expires timestamp The time this PSHB lease expires.
source_user_interactions bigint The number of user interactions from other sources on this social network computed from the graph as we index content. This is periodically computed and loaded into our source index. This could be the number of at mentions, comment replies, etc.
source_setting_minimum_content_metadata_score int The minimum metadata score before we can persist content
source_next_update timestamp The next time we’ve scheduled the source to update
source_title text The title of the source.
source_description text A short description of the source.
source_handle text Unique handle for this source across the entire social media property.
source_favorites int The number of favorites this source has according to the website or social network.
source_followers int The number of followers this source has according to the website or social network.
source_following int The number of users / friends this source is following.
source_verified boolean True when this user account is verified to be authentic.
source_profiles set<text> Set of URLs on other social networking sites and weblogs for this user. These are essentially alternate profiles for the user. Their twitter site, facebook site, etc.
source_location text The human readable location of the source. Example: 'Washington, DC’
source_image_src text The URL to the img which represents this source.
source_image_width int The width of the image.
source_image_height int The height of the image.
source_telephone text The telephone number for this source. Only present in limited situations. Specifically around REVIEW sites.
source_tags set<text> Tags for the source provided by the user’s profile.
source_rating_value text The rating for this item provided by the user.
source_favicon_src text The URL to the favicon which represents this source.
source_favicon_width int The width of the favicon.
source_favicon_height int The height of the favicon.
source_created timestamp The time this account was created and is provided from the source.
source_likes int The number of Facebook likes for this source.
source_related_tags set<text> A set of tags, optionally assigned by a site, which relate to this specific source. Supported for medium.com only (for now)
source_parsed_posts int The number of posts parsed/found when we last indexed this source.
source_parsed_posts_max int The maximum number of parsed posts we’ve ever seen. If parsed_posts_max greater than zero and parsed_posts is 0 then we are probably hitting a throttle or failing to parse the content.
source_feed_href text The URL of the RSS feed.
source_feed_title text The title of the feed.
source_feed_format enum The format of the feed as a token. RSS or ATOM, etc.
permalink text The unique URL to the content.
identifier text A platform specific unique identifier for this post. Note that this is NOT always present as some platforms lack the concept of unique identifiers. Additionally, this may conflict with another identifier from another platform.
permalink_redirect text Same as permalink but if the site performs a 301 or 302 redirect this is the URL we were redirected to.
permalink_redirect_domain text The domain for the permalink_redirect. Identical in semantics to the domain field.
permalink_redirect_site text The site for the permalink_redirect. The full hostname. For example, www.cnn.com, alice.blogspot.com, etc.
link text The primary link to the content. The vast majority of the time, this is identical to permalink. However, some publisher types (MEMETRACKER) have a different link to the content which is external to the site. If the link is NOT the same as the permalink, then we include it in the links field for search and accuracy purposes.
link_domain text The domain for the link. Identical in semantics to the domain field.
link_site text The site for the link. The full hostname. For example, www.cnn.com, alice.blogspot.com, etc.
shortlink text The shortlink URL, if known. This is the prefered 'short’ URL discovered from either the content itself or through metdata.
canonical text The canonical URL to the content (as specified by the publisher) in rel=canonical (and other specs such as og:url).
domain text The domain name of the permalink. blogspot.com, example,com, etc.
site text The site of the permalink including the full host name. www.cnn.com would be a site and cnn.com would be a domain.
main text The actual main content of the article. The authoritative 'main’ of the post derived by removing sidebar content. (html). This content is sanitized, cleaned so that javascript, event handlers, etc are removed. This is analagous to the HTML5 main element. IE the main content of the page, with no header, footer, or sidebar content.
main_length int The length of the main field, in bytes.
main_checksum text The checksum of the main field.
main_authoritative boolean True when the main content is 100% accurate and the extract is not needed.
main_format enum The format of the main element (either HTML or text)
extract text The extract of the content with applied chrome/boilerpipe removal algorithms applied.
extract_length int The length of the extract field, in bytes.
extract_checksum text The checksum of the extract field.
summary_text text A summary of the document computed by our document summarizer. This summary is in plain text. If mulitiple paragraphs are present they are separated by a newline. If you would like to separate the paragraphs in your UI and you’re rendering HTML you can split the summary text by newline and wrap each paragraph in a P element.
title text The title of the post.
publisher text The publisher name. (CNN, MSNBC, Techcrunch, etc)
section text Articles may belong to one or more 'sections’ in a magazine or newspaper, such as Sports, Lifestyle, etc.
description text A short description of the item (HTML)
tags set<text> Tags for the item.
mentions set<text> Username mentions for users within the content of this post.
links set<text> All outbound links in the main element. Since main is the authoritative content, without chrome or sidebar content, this can be used for ranking purposes.
published timestamp Date of first broadcast/publication.
modified timestamp The date on which the content was most recently modified.
published_partial timestamp This is identical to published except it’s a partial value. If an exact date is found we both fields are populated but if we only have a partial date then we only specify this field. The value is ISO8601. For example, 2014-01-01.
modified_partial timestamp This is identical to modified except it’s a partial value. If an exact date is found we both fields are populated but if we only have a partial date then we only specify this field. The value is ISO8601. For example, 2014-01-01.
author_name text The name of the author. This is the human readable name like 'Barack Obama’ or 'Michael Jordan’
author_link text The link for the author.
author_handle text The handle of the author. This is a unique token/handle for the author across the whole site. For example 'barackobama’ and would never conflict with another account.
author_followers int The number of followers for this author.
author_location text The location for this author.
author_avatar_img text The URL to the img which is an avatar for the user who posted this content.
author_avatar_width int The width of the avatar img.
author_avatar_height int The height of the avatar img.
author_user_id text User ID in the target platform (when available)
author_gender enum When present, the gender of the author.
geo_location text The human readable location of the source. Example: 'Washington, DC’
geo_location_id text The location identifier (if available) for this location. This is platform specific.
geo_featurename text Name of the feature we’re representing.
geo_point geo_point A point contains a single latitude-longitude pair, separated by whitespace.
geo_box text A bounding box is a rectangular region, often used to define the extents of a map or a rough area of interest. A box contains two space seperate latitude-longitude pairs, with each pair separated by whitespace. The first pair is the lower corner, the second is the upper corner.
geo_name_id text Id in geonames database.
geo_name text The human readable location including its parent locations
geo_country text The human readable country derived from geo_location. These are represented as ISO 3166-1 alpha-2: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
geo_state text The human readable state derived from geo_location.
geo_city text The human readable city derived from geo_location.
geo_method enum Contains the name of the field used to parse the geo data
rating_value text The rating for this item provided by the user.
favicon_src text The URL to the favicon which represents this source.
favicon_width int The width of the favicon.
favicon_height int The height of the favicon.
image_src text The URL to the img which represents this content.
image_width int The width of the image.
image_height int The height of the image.
image1_src text One of the images URL representing this content
image1_width int The width of the image
image1_height int The height of image
image2_src text One of the images URL representing this content
image2_width int The width of the image
image2_height int The height of image
image3_src text One of the images URL representing this content
image3_width int The width of the image
image3_height int The height of image
image4_src text One of the images URL representing this content
image4_width int The width of the image
image4_height int The height of image
image5_src text One of the images URL representing this content
image5_width int The width of the image
image5_height int The height of image
image6_src text One of the images URL representing this content
image6_width int The width of the image
image6_height int The height of image
shared boolean True when this source was not published by the original user but actually shared from someone the source follows. On microblogging platforms this is a retweet. On others it’s a shared post.
shared_type enum The type of shared content.
shared_profile_link text Deprecated: See shared_author_link
shared_profile_title text Deprecated: See shared_author_name
shared_author_link text The link to the profile of the person who originally posted this story.
shared_author_name text The title of the profile of the person who originally posted this story.
shared_author_user_id text User ID in the target platform (when available)
shared_identifier text A platform specific unique identifier for this post.
shared_permalink text The unique URL to the content.
shared_author_handle text The handle of the author. This is a unique token/handle for the author across the whole site. For example 'barackobama’ and would never conflict with another account.
shared_author_avatar_img text The URL to the img which is an avatar for the user who originally posted this content.
replied boolean True when this source was a reply, false otherwhise
replied_profile_link text The link to the profile of the person being replied to.
replied_profile_title text The title of the profile of the person being replied to.
card enum When present, the type of card that can be used to display this content within web applications
video_player text The URL to an iframe which can be embedded to play this video. HTTPS URL to iframe player. This must be a HTTPS URL which does not generate active mixed content warnings in a web browser
video_player_width int The width of the player iframe.
video_player_height int The height of the player iframe.
video1_player text The URL to one of the iframes which can be embedded to play this video. HTTPS URL to iframe player. This must be a HTTPS URL which does not generate active mixed content warnings in a web browser
video1_player_width int The width of one of the players iframe.
video1_player_height int The height of one of the players iframe.
video2_player text The URL to one of the iframes which can be embedded to play this video. HTTPS URL to iframe player. This must be a HTTPS URL which does not generate active mixed content warnings in a web browser
video2_player_width int The width of one of the players iframe.
video2_player_height int The height of one of the players iframe.
video3_player text SearchQueryRequestFrontendServiceTestThe URL to one of the iframes which can be embedded to play this video. HTTPS URL to iframe player. This must be a HTTPS URL which does not generate active mixed content warnings in a web browser
video3_player_width int The width of one of the players iframe.
video3_player_height int The height of one of the players iframe.
video4_player text The URL to one of the iframes which can be embedded to play this video. HTTPS URL to iframe player. This must be a HTTPS URL which does not generate active mixed content warnings in a web browser
video4_player_width int The width of one of the players iframe.
video4_player_height int The height of one of the players iframe.
video5_player text The URL to one of the iframes which can be embedded to play this video. HTTPS URL to iframe player. This must be a HTTPS URL which does not generate active mixed content warnings in a web browser
video5_player_width int The width of one of the players iframe.
video5_player_height int The height of one of the players iframe.
video6_player text The URL to one of the iframes which can be embedded to play this video. HTTPS URL to iframe player. This must be a HTTPS URL which does not generate active mixed content warnings in a web browser
video6_player_width int The width of one of the players iframe.
video6_player_height int The height of one of the players iframe.
type enum The type of this content as either a POST or a COMMENT. This allows us to index posts and comments through the same API.
sentiment enum The overall sentiment for this content
lang ascii ISO language code for this source. All our language codes are ISO 639 two letter lang codes. We use the special lang code of U when we are unable to determine the language from the underlying text - usually because we don’t have enough data.
categories map<ascii,double> Provides a map between algorithmically determined categories (entertainment, politics, technology, science, sports, business, health) and their probabilities. The probabilities are between 0.0 and 1.0 and if you sum them all they will equal 1.0.
duplicates map<bigint,double> Provides data on previously posted documents which are duplicates of this document. Keys are sequence values for the documents and the is a double between 0.0 and 1.0 where 0.0 is no duplication and 1.0 is full duplication
duplicates_count int The total number of duplicates.
classifications map<ascii,double> Provides a map between algorithmically determined classifications driven by customers. The keys are keys given to customers identify their classification and the value is the probability of that classification. The values DO NOT sum to 1.0 as there may be multiple classifications here.
parent_hashcode ascii See content.hashcode
parent_permalink text See content.permalink
parent_title text See content.title
parent_lang ascii See content.lang
parent_resource text See content.resource
likes int The number of likes for this post (when we first find it).
dislikes int The number of dislikes for this post (when we first find it).
comments int The number of comments for this post (when we first find it).
views int The number of views for this post (when we first find it).
watch_time text This only applies for video platforms. The description of how many time this video was watch. Note that this field DOES NOT update dynamically.
subscriptions_driven int This only applies for video platforms. How many subscriptions to the channel where done from this video. Note that this field DOES NOT update dynamically.
metadata_score int The quality of the metadata on this post. Used internally to audit the quality of Spinn3r data. Not very applicable to customer use.
shares int The number of shares for this post. For some microblogging platforms this could be a rewtweet but for others its a share. Most platforms have this concept.
metadata_updates int The number of updates to metadata we have
pinned boolean True when when the user has pinned this content to their profile effectively locking the post in place.
image_user_tags map<text,text> Users tagged in the image and their coordinates within the image. Coordinates are expressed as a factor of the width and height using 0,0 as the top left corner, this is, in a 100x100 px image the position 0.23 , 0.55 is at 23 px from the left and 55 px from the top. This field is only valid for image social data such as Instagram or Facebook

Enum index_method

This is an enum type for index_method. The following values are accepted:

Enum Name Description
PERMALINK_TASK Content indexed by the permalink task.
SOURCE_TASK Content indexed by the source task.
PSHB Content indexed by pubsubhubbub hub push.
SOURCE_TASK_COMPOSITE Content indexed by the source task for a composite post.
FEED_TASK Content indexed by the feed task.
TWITTER_TASK Content indexed by the twitter task.

Enum detection_method

This is an enum type for detection_method. The following values are accepted:

Enum Name Description
SOURCE Found via the source.
FEED Found via the feed.

Enum source_publisher_type

This is an enum type for source_publisher_type. The following values are accepted:

Enum Name Description
UNKNOWN Unknown publisher type.
WEBLOG Weblog. Defined as a smaller site, usually owned by an individual.
MAINSTREAM_NEWS Mainstream news source. Generally owned by a corporation with multiple paid writers.
CLASSIFIED Classified site. Craigslist, Backpage, etc.
FORUM Forum sites like phpBB, phorum, vbulletin, etc
REVIEW Review site. Like epinions, amazon reviews, etc.
MEMETRACKER Memetracker like reddit, digg, techmeme, google news, etc
MICROBLOG Microblog content such as Twitter, identi.ca, etc.
SOCIAL_MEDIA Social media sites (facebook, instagram, etc).
VIDEO Video hosting site like Youtube, Vimeo, etc.
PHOTO Photo sharing site like instagram, flickr, etc.

Enum source_setting_update_strategy

This is an enum type for source_setting_update_strategy. The following values are accepted:

Enum Name Description
CYCLICAL Default update strategy. Essentially just update the source at a regular rate.
ADAPTIVE Adapt the update interval based on the posting frequency of the source. This way we update sources less frequently if they post once per month compared to sources that update once per hour.
PUSH The source content is pushed to us directly via push (pshb)
SEARCH The source is updated only from a search feed. It’s not updated directly but a parent source updates it.
PING We receive a ping via an external update mechanism. This is a notice that the blog has been updated. Then we launch a task to fetch the content.
INDIRECT This source is NOT updated directly but rather is updated indirectly via another source. Usually a search feed.
FEED The source is updated via an RSS / Atom feed. We don’t index it directly.
NONE The source is not updated.
SCHEDULED This source is using the scheduled setting based on cron.

Enum source_setting_index_strategy

This is an enum type for source_setting_index_strategy. The following values are accepted:

Enum Name Description
DEFAULT Default index strategy. Just a normal source. No special or fancy strategy.
SEARCH_KEYWORD This source is a search source driven by keywords.
SEARCH_USERNAME This source is a search source driven by ranked usernames.
SEARCH_KEYWORD_AND_CITY This source is a search source driven by keywords but we include the city as well.
COVERING_SET This source is indexed via a covering set of subscriptions.

Enum source_setting_author_policy

This is an enum type for source_setting_author_policy. The following values are accepted:

Enum Name Description
DEFAULT Default author policy. Which is essentially take no special action.
COMPOSITE This is a composite source. At this point we create a new source or update an existing source for each author.

Enum source_feed_format

This is an enum type for source_feed_format. The following values are accepted:

Enum Name Description
UNKNOWN Unknown feed format
ATOM Atom feed format
RSS RSS feed format

Enum main_format

This is an enum type for main_format. The following values are accepted:

Enum Name Description
HTML HTML form of the content. This is the default for the vast majority of the content we index.
TEXT The content is formatted as plain text. Which is primarily used for Twitter and other microblogging services.

Enum author_gender

This is an enum type for author_gender. The following values are accepted:

Enum Name Description
MALE Male
FEMALE Female
UNKNOWN Unknown

Enum geo_method

This is an enum type for geo_method. The following values are accepted:

Enum Name Description
DEFAULT The default strategy was used which is the location specified in the content.
SOURCE_LOCATION The location of the source was used to compute the geo data.

Enum shared_type

This is an enum type for shared_type. The following values are accepted:

Enum Name Description
NONE This is not shared
RAW This is shared content but no additional text has been given. IE it is raw.
REPLY Shared content but the user has added additional content/text.

Enum card

This is an enum type for card. The following values are accepted:

Enum Name Description
SUMMARY Basic summary of the content.
SUMMARY_LARGE_IMAGE Basic summary of the content using a large image
PHOTO The content is a photo
GALLERY The content is a photo gallery with multiple images
PLAYER The content is an embedded video player

Enum type

This is an enum type for type. The following values are accepted:

Enum Name Description
POST A blog post, mainstream news article, tweet, etc.
COMMENT A reply to a post inline, usually by members of the community.

Enum sentiment

This is an enum type for sentiment. The following values are accepted:

Enum Name Description
POSITIVE Positive sentiment
NEGATIVE Negative sentiment
NEUTRAL Neutral sentiment (neither positive nor negative)

source schema

Stores metadata for representing a source in our index. Weblog, twitter, mainstream news, etc.

Member Name Type Description
hashcode ascii base64filesafe(sha1(resource)) of the source. Essentially the base 64 (filesafe) encoding of the sha1 of the tokenized permalink/url of the source.
resource text The tokenized URL for this source.
link text The non-tokenized URL for this source. Use this URL if you would like to fetch this source via HTTP.
publisher_type enum The publisher type (mainstream news, weblog, forum, etc) of this source encoded as an int.
publisher_subtype text A string representing the publisher sub type which is more specific than the publisher type. The publisher subtype is usually the name of the social network hosting the content.
date_found timestamp The time we added this source to our index. This is the time we found the source not when it was created.
last_updated timestamp The last time our crawler visited the source and processed it with a task. This is always incremented even if the site isn’t updated or even if the site is HTTP 500 or other network/transient errors. This may not be updated if we aren’t fetching the source via HTTP.
last_published timestamp The last time this source published a new HTML file (as measured by content_sha1). This may not be updated if we aren’t fetching the source via HTTP.
last_posted timestamp The last time this source posted a new piece of content
update_interval varint The number of milliseconds between updates to re-fetch this source. This is used to for cyclical updates of sources and usually depends on how often the source posts updates.
http_status varint The HTTP status code of the last request to this source.
spam_probability float The probability, between 0 and 1, that this source is a spam source. -1.0 if we have not yet classified it.
content_length varint The length, in bytes, of this HTML from the last time we fetched the page.
content_checksum text The SHA1 checksum of the content.
assigned_tags set<text> The set of tags assigned to this source by the either customers or spinn3r (globally). This is used so that your client can filter by assigned tags or search by them as well. This is not to be confused with the tags field which are assigned by the site. These tags are opaque strings and not human readable to avoid giving away any customer information in the API. Any sources you manually register are assigned tags with your vendor auth code. This will allow you to register sources, and then filter / search over them.
robot_link_filter set<blob> Set of hashcodes for URLs that were present on the page during the last fetch. Used to prevent duplicate indexing.
setting_priority varint The priority of a source for queueing purposes. Acceptable values are from 0-9 with 0 being the lowest priority and 9 being the highest. This allows us to efficiently rebuild the queue based on priorities. It will also (in the future) allow the scheduler to scheduler higher priority items first.
setting_trace boolean When true, tracing is enabled on this source to write log messages to cassandra for debug purposes.
setting_crawl_template text JSON parser definition (valhalla) for our robot parsing rules for extracting content from the raw HTML. This is either generated by hand by Spinn3r internally or trained via an external process like mturk.
setting_post_persist_policy enum How to handle posts when we find them on the front page of content. For microblogs we can probably just write them. For other types of sources we should probably not write them and follow instead.
setting_follow_post_pattern text Regular expression for matching posts (extracted from metadata) to crawl and index. Any post URL on the page that matches the pattern will be crawled. To disable just use NoFollowPolicy which generates a regex that always fails.
setting_crawl_permalink_pattern text Regular expression for matching URLs to crawl and index. Any permalink URL on the page that matches the pattern will be crawled. To disable just use NoFollowPolicy which generates a regex that always fails.
setting_user_agent_id boolean Override the default user agent. These are present in a global repository or user agents which robots cache locally.
setting_proxy boolean Override the default proxy setting (which is probably true).
setting_proxy_host ascii Override the default proxy host.
setting_disabled enum When greater than zero, this source is marked as disabled.
setting_update_strategy enum The update strategy for computing the update interval.
setting_index_strategy enum The update stratey for computing the update interval.
setting_author_policy enum Policy on handling author metadata.
setting_indirect boolean True if this is an indirect source which we don’t monitor directly. Direct sources approach 100% accuracy as we directly index the sources content. Indirect sources are only the result of indexing some type of secondary system which doesn’t have all the posts from the source.
setting_fall_through_crawling_enabled boolean When true, we enable crawling via setting_crawl_permalink_pattern and a same site policy.
pshb_hub text The PSHB hub this source is using.
pshb_topic text The PSHB topic this source is using.
pshb_last_posted timestamp The last time this source posted and sent a message to the PSHB endpoint.
pshb_lease_expires timestamp The time this PSHB lease expires.
setting_message_owner bigint The ID that tasks are assigned which match this source. This way we can avoid duplicate message enqueue since we verify that the task has the right message_owner.
setting_loader_tags set<ascii> The set of tags assigned to this source by the loader. The loader can assign tags to both a source and discovery object and if we have an error we can back out a specific set of loaded sources.
robot_throttle_key text The last throttle key used to execute this source. This can be used to quickly re-execute a source without computing a new throttle key, or to audit a specific source and potentially how far behind it is in the queue.
setting_cookies map<text,text> Cookies used when requesting content form this source.
setting_headers map<text,text> HTTP request headers sent when requesting content from this source.
user_interactions bigint The number of user interactions from other sources on this social network computed from the graph as we index content. This is periodically computed and loaded into our source index. This could be the number of at mentions, comment replies, etc.
setting_template_identifier_source text The template identifier used for this source (when present).
setting_template_identifier_permalink text The template identifier used to index permalinks published from this source (when present).
setting_permalink_update_strategy enum The update strategy for fetching/indexing permalinks on this source..
setting_permalink_update_interval bigint The number of milliseconds between updates to re-fetch this permalink. This is used to for cyclical updates .
setting_permalink_update_max_age_seconds bigint The maximum amount of time (in seconds) where we should index permalinks.
setting_content_init_strategy enum How should we handle new content. By default we index all content on a source the first time we see it.
setting_content_filter_strategy enum Hard coded strategy for avoiding duplicate URLs
setting_minimum_content_metadata_score int The minimum metadata score before we can persist content
setting_max_pages int The maximum number of pages (overriding the default) to fetch for this source.
setting_max_retries int The maximum number of HTTP retries (overriding the default) to fetch within one task without rescheduling / retrying it for future execution.
setting_schedule ascii The schedule (cron syntax) for re-executing this message in the future.
next_update timestamp The next time we’ve scheduled the source to update
setting_required_scan_window int Certain sources need to be re-indexed for a given time window as posts update to collect likes, shares, etc. This is the amount of time, in seconds, to go through and scan sources making sure we’ve indexed everything.
title text The title of the source.
description text A short description of the source.
handle text Unique handle for this source across the entire social media property.
favorites int The number of favorites this source has according to the website or social network.
followers int The number of followers this source has according to the website or social network.
following int The number of users / friends this source is following.
verified boolean True when this user account is verified to be authentic.
profiles set<text> Set of URLs on other social networking sites and weblogs for this user. These are essentially alternate profiles for the user. Their twitter site, facebook site, etc.
location text The human readable location of the source. Example: ‘Washington, DC’
image_src text The URL to the img which represents this source.
image_width int The width of the image.
image_height int The height of the image.
telephone text The telephone number for this source. Only present in limited situations. Specifically around REVIEW sites.
tags set<text> Tags for the source provided by the user’s profile.
rating_value text The rating for this item provided by the user.
favicon_src text The URL to the favicon which represents this source.
favicon_width int The width of the favicon.
favicon_height int The height of the favicon.
created timestamp The time this account was created and is provided from the source.
likes int The number of Facebook likes for this source.
related_tags set<text> A set of tags, optionally assigned by a site, which relate to this specific source. Supported for medium.com only (for now)
parsed_posts int The number of posts parsed/found when we last indexed this source.
parsed_posts_max int The maximum number of parsed posts we’ve ever seen. If parsed_posts_max greater than zero and parsed_posts is 0 then we are probably hitting a throttle or failing to parse the content.
feed_href text The URL of the RSS feed.
feed_title text The title of the feed.
feed_format enum The format of the feed as a token. RSS or ATOM, etc.

Enum publisher_type

This is an enum type for publisher_type. The following values are accepted:

Enum Name Description
UNKNOWN Unknown publisher type.
WEBLOG Weblog. Defined as a smaller site, usually owned by an individual.
MAINSTREAM_NEWS Mainstream news source. Generally owned by a corporation with multiple paid writers.
CLASSIFIED Classified site. Craigslist, Backpage, etc.
FORUM Forum sites like phpBB, phorum, vbulletin, etc
REVIEW Review site. Like epinions, amazon reviews, etc.
MEMETRACKER Memetracker like reddit, digg, techmeme, google news, etc
MICROBLOG Microblog content such as Twitter, identi.ca, etc.
SOCIAL_MEDIA Social media sites (facebook, instagram, etc).
VIDEO Video hosting site like Youtube, Vimeo, etc.
PHOTO Photo sharing site like instagram, flickr, etc.

Enum setting_post_persist_policy

This is an enum type for setting_post_persist_policy. The following values are accepted:

Enum Name Description
WRITE Write posts when found on a source using the metadata extraction.
NOWRITE DO NOT write posts.

Enum setting_disabled

This is an enum type for setting_disabled. The following values are accepted:

Enum Name Description
ENABLED Default state. The souce is enabled.
DISABLED The source is disabled but without a specific reason.
SPAM The source has been marked as spam.
DUPLICATE Duplicate of another source.
INVALID Invalid source. Not anything we are interested in indexing.

Enum setting_update_strategy

This is an enum type for setting_update_strategy. The following values are accepted:

Enum Name Description
CYCLICAL Default update strategy. Essentially just update the source at a regular rate.
ADAPTIVE Adapt the update interval based on the posting frequency of the source. This way we update sources less frequently if they post once per month compared to sources that update once per hour.
PUSH The source content is pushed to us directly via push (pshb)
SEARCH The source is updated only from a search feed. It’s not updated directly but a parent source updates it.
PING We receive a ping via an external update mechanism. This is a notice that the blog has been updated. Then we launch a task to fetch the content.
INDIRECT This source is NOT updated directly but rather is updated indirectly via another source. Usually a search feed.
FEED The source is updated via an RSS / Atom feed. We don’t index it directly.
NONE The source is not updated.
SCHEDULED This source is using the scheduled setting based on cron.

Enum setting_index_strategy

This is an enum type for setting_index_strategy. The following values are accepted:

Enum Name Description
DEFAULT Default index strategy. Just a normal source. No special or fancy strategy.
SEARCH_KEYWORD This source is a search source driven by keywords.
SEARCH_USERNAME This source is a search source driven by ranked usernames.
SEARCH_KEYWORD_AND_CITY This source is a search source driven by keywords but we include the city as well.
COVERING_SET This source is indexed via a covering set of subscriptions.

Enum setting_author_policy

This is an enum type for setting_author_policy. The following values are accepted:

Enum Name Description
DEFAULT Default author policy. Which is essentially take no special action.
COMPOSITE This is a composite source. At this point we create a new source or update an existing source for each author.

This is an enum type for setting_permalink_update_strategy. The following values are accepted:

Enum Name Description
NONE Never update this permalink
CYCLICAL Default update strategy. Essentially just update the permalink at a regular rate.

Enum setting_content_init_strategy

This is an enum type for setting_content_init_strategy. The following values are accepted:

Enum Name Description
FUTURE_ONLY

Enum setting_content_filter_strategy

This is an enum type for setting_content_filter_strategy. The following values are accepted:

Enum Name Description
NONE No explicit strategy
ROBOT_FLAT_LINK_FILTER Use the flat link filter.

Enum feed_format

This is an enum type for feed_format. The following values are accepted:

Enum Name Description
UNKNOWN Unknown feed format
ATOM Atom feed format
RSS RSS feed format

Near Duplicates

Near duplicates are common on the web. Some of the most common examples are the Associated Press and Reuters which both syndicate their content to other websites.

This can lead to the same content present on hundreds of websites.

Additionally, the same media property could publish the same content under different URLs. For example a different sub-domain such as cnn.com vs money.cnn.com

Detecting Near Duplicates

Search example collapsing near duplicates:

{
  "query": {
    "query_string" : {
        "query" : "source_publisher_type:MAINSTREAM_NEWS AND duplicates_count:>10"
    }
  },  
    "aggs": {
        "by_duplicate_identifier": {
            "terms": {
                "field": "duplicate_identifier",
                "size": 1
            },
            "aggs": {
                "primary_documents": {
                    "top_hits": {
                        "size": 1
                    }
                }
            }
        }
    }
}

Fortunately, Spinn3r has built in near duplicate detection.

We can discover when content is identical and suppress the content by allowing you to see which document IDs are identical or allowing you to group by a duplicate_identifier.

There are two main facilities here for suppressing duplicates. I’ll start with the one I think you care about most but also include how to handle it in a more generic form.

You can run a general query for content but include an Elasticsearch aggregation to suppress/collapse the duplicates and only return the first.

Each document has a duplicate identifier which is essentially a cluster of duplicates. You’re basically collapsing them here and just returning the first.

This is probably the easiest way to get started.

Note in the above query I added

duplicates_count:>10

This was just to highlight the duplicates so that this query collapses the duplicate documents easily.

Additionally, you can run a full query and or store results on your end. There’s a ‘duplicates’ field which has a map of document ID to jaccard coefficient / similarity coefficient which is just a similarity probability. Values are from 0.0 to 1.0 inclusive AKA [0.0,1.0]

You can also collapse stored documents on your end using this duplicates field.

Geo countries

Spinn3r supports the following GEO country codes:

1
Afghanistan AF
2
Aland Islands AX
3
Albania AL
4
Algeria DZ
5
American Samoa AS
6
Andorra AD
7
Angola AO
8
Anguilla AI
9
Antarctica AQ
10
Antigua and Barbuda AG
11
Argentina AR
12
Armenia AM
13
Aruba AW
14
Australia AU
15
Austria AT
16
Azerbaijan AZ
17
Bahamas BS
18
Bahrain BH
19
Bangladesh BD
20
Barbados BB
21
Belarus BY
22
Belgium BE
23
Belize BZ
24
Benin BJ
25
Bermuda BM
26
Bhutan BT
27
Bolivia BO
28
Bosnia and Herzegovina BA
29
Botswana BW
30
Bouvet Island BV
31
Brazil BR
32
British Indian Ocean Territory IO
33
British Virgin Islands VG
34
Brunei Darussalam BN
35
Bulgaria BG
36
Burkina Faso BF
37
Burundi BI
38
Cambodia KH
39
Cameroon CM
40
Canada CA
41
Cape Verde CV
42
Cayman Islands KY
43
Central African Republic CF
44
Chad TD
45
Chile CL
46
China CN
47
Christmas Island CX
48
Cocos (Keeling) Islands CC
49
Colombia CO
50
Comoros KM
51
Congo (Brazzaville) CG
52
Congo, Democratic Republic of the CD
53
Cook Islands CK
54
Costa Rica CR
55
Croatia HR
56
Cuba CU
57
Cyprus CY
58
Czech Republic CZ
59
Côte d'Ivoire CI
60
Denmark DK
61
Djibouti DJ
62
Dominica DM
63
Dominican Republic DO
64
Ecuador EC
65
Egypt EG
66
El Salvador SV
67
Equatorial Guinea GQ
68
Eritrea ER
69
Estonia EE
70
Ethiopia ET
71
Falkland Islands (Malvinas) FK
72
Faroe Islands FO
73
Fiji FJ
74
Finland FI
75
France FR
76
French Guiana GF
77
French Polynesia PF
78
French Southern Territories TF
79
Gabon GA
80
Gambia GM
81
Georgia GE
82
Germany DE
83
Ghana GH
84
Gibraltar GI
85
Greece GR
86
Greenland GL
87
Grenada GD
88
Guadeloupe GP
89
Guam GU
90
Guatemala GT
91
Guernsey GG
92
Guinea GN
93
Guinea-Bissau GW
94
Guyana GY
95
Haiti HT
96
Heard Island and Mcdonald Islands HM
97
Holy See (Vatican City State) VA
98
Honduras HN
99
Hong Kong, Special Administrative Region of China HK
100
Hungary HU
101
Iceland IS
102
India IN
103
Indonesia ID
104
Iran, Islamic Republic of IR
105
Iraq IQ
106
Ireland IE
107
Isle of Man IM
108
Israel IL
109
Italy IT
110
Jamaica JM
111
Japan JP
112
Jersey JE
113
Jordan JO
114
Kazakhstan KZ
115
Kenya KE
116
Kiribati KI
117
Korea, Democratic People’s Republic of KP
118
Korea, Republic of KR
119
Kuwait KW
120
Kyrgyzstan KG
121
Lao PDR LA
122
Latvia LV
123
Lebanon LB
124
Lesotho LS
125
Liberia LR
126
Libya LY
127
Liechtenstein LI
128
Lithuania LT
129
Luxembourg LU
130
Macao, Special Administrative Region of China MO
131
Macedonia, Republic of MK
132
Madagascar MG
133
Malawi MW
134
Malaysia MY
135
Maldives MV
136
Mali ML
137
Malta MT
138
Marshall Islands MH
139
Martinique MQ
140
Mauritania MR
141
Mauritius MU
142
Mayotte YT
143
Mexico MX
144
Micronesia, Federated States of FM
145
Moldova MD
146
Monaco MC
147
Mongolia MN
148
Montenegro ME
149
Montserrat MS
150
Morocco MA
151
Mozambique MZ
152
Myanmar MM
153
Namibia NA
154
Nauru NR
155
Nepal NP
156
Netherlands NL
157
Netherlands Antilles AN
158
New Caledonia NC
159
New Zealand NZ
160
Nicaragua NI
161
Niger NE
162
Nigeria NG
163
Niue NU
164
Norfolk Island NF
165
Northern Mariana Islands MP
166
Norway NO
167
Oman OM
168
Pakistan PK
169
Palau PW
170
Palestinian Territory, Occupied PS
171
Panama PA
172
Papua New Guinea PG
173
Paraguay PY
174
Peru PE
175
Philippines PH
176
Pitcairn PN
177
Poland PL
178
Portugal PT
179
Puerto Rico PR
180
Qatar QA
181
Romania RO
182
Russian Federation RU
183
Rwanda RW
184
Réunion RE
185
Saint Helena SH
186
Saint Kitts and Nevis KN
187
Saint Lucia LC
188
Saint Pierre and Miquelon PM
189
Saint Vincent and Grenadines VC
190
Saint-Barthélemy BL
191
Saint-Martin (French part) MF
192
Samoa WS
193
San Marino SM
194
Sao Tome and Principe ST
195
Saudi Arabia SA
196
Senegal SN
197
Serbia RS
198
Seychelles SC
199
Sierra Leone SL
200
Singapore SG
201
Slovakia SK
202
Slovenia SI
203
Solomon Islands SB
204
Somalia SO
205
South Africa ZA
206
South Georgia and the South Sandwich Islands GS
207
South Sudan SS
208
Spain ES
209
Sri Lanka LK
210
Sudan SD
211
Suriname * SR
212
Svalbard and Jan Mayen Islands SJ
213
Swaziland SZ
214
Sweden SE
215
Switzerland CH
216
Syrian Arab Republic (Syria) SY
217
Taiwan, Republic of China TW
218
Tajikistan TJ
219
Tanzania *, United Republic of TZ
220
Thailand TH
221
Timor-Leste TL
222
Togo TG
223
Tokelau TK
224
Tonga TO
225
Trinidad and Tobago TT
226
Tunisia TN
227
Turkey TR
228
Turkmenistan TM
229
Turks and Caicos Islands TC
230
Tuvalu TV
231
Uganda UG
232
Ukraine UA
233
United Arab Emirates AE
234
United Kingdom GB
235
United States Minor Outlying Islands UM
236
United States of America US
237
Uruguay UY
238
Uzbekistan UZ
239
Vanuatu VU
240
Venezuela (Bolivarian Republic of) VE
241
Viet Nam VN
242
Virgin Islands, US VI
243
Wallis and Futuna Islands WF
244
Western Sahara EH
245
Yemen YE
246
Zambia ZM
247
Zimbabwe ZW

Geo States

state
Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
Florida
Georgia
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
Washington, D.C.
West Virginia
Wisconsin
Wyoming