Python and the NYTimes Api

After installing Python v2.7 and registering a key for the New York Times Article Search API i was good to go and writing code.

The first thing we have to do, is to find out, how we can request data from the NYT API, this means reading the documentation http://developer.nytimes.com/docs and looking for tutorials on the net:

http://data-gov.tw.rpi.edu/wiki/How_to_use_New_York_Times_Article_Search_API

There is also a tool for generating url-request strings: http://prototype.nytimes.com/gst/apitool/index.html

These url string can simply be typed into the url bar of your browser, and it will show you the results. These results will only be a line of text and symbols and are not very human-readable. The format i used was JSON, XML is also available in some of the API’s.

Our weapon of choice, to sort through these data-formats will be python. Python 2.7 has support for reading websites and decoding JSON/XML.

First off, we want to request the data from the API, we use the urllib2 module for this:


request_string = 'http://api.nytimes.com/svc/search/v1/article?format=json&query=germany+finances&offset='+str(offset)+'&api-key=####'
response = urllib2.urlopen(request_string)
content = response.read()

Here, we build our request url, in this case i’m searching for german finances with an offset of 2, which will deliver the 20th to the 30th article. The urllib2.urlopen method will send an HTTP request to the url and store the resulting website in the “response” variable.
We know that the result is a string in JSON format. This means that we can use the json module to decode and traverse the data sent to us

import urllib2
import json
....
decoded = json.loads(content)
date_of_first_article = decoded['results'][0]['date']

This will decode the JSON string into objects and lists in python, which are much easier to work with. You can visualise this data as a tree, which starts with the ‘results’ node at the top, which has the articles as children:tree of json
To understand the structure of JSON strings, you can simple look at the JSON string and read the API. In this case, the results-node contains a list of articles( accessed by [0] to [9] ), and every article node has data, like its text, date, title and url to the original story.
In this example, we will make multiple queries via a for-loop and extraxt the date, title and url.
Here is how a for-loop could look like:
for offset in xrange(10):

 request_string = 'http://api.nytimes.com/svc/search/v1/article?format=json&query=germany+finances&offset='+str(offset)+'&api-key=bcbdffbec58353edaf892db8b2e4d3fb:5:67631538'
 response = urllib2.urlopen(request_string)
 content = response.read()

this for-loop will will assign the values 0 to 10 to the “offset” variable and run the indented code every time. For now, this only makes queries with different offsets, but we do not do anything with the data yet. We can now use the JSON objects to extract data and write it to a file

import urllib2
import json
def year(d):
 return d[0:4] 

def month(d):
 return d[4:6]

def day(d):
 return d[6:8]
f = open('newfile','w')
for offset in xrange(2):
 print "-"
 request_string = 'http://api.nytimes.com/svc/search/v1/article?format=json&query=germany+finances&offset='+str(offset)+'&api-key=####'
 response = urllib2.urlopen(request_string)
 content = response.read()
 #decoded = json.loads(response_string)
decoded = json.loads(content)
for x in decoded['results']:
 string = x['date']
 f.write(year(string) + " " + month(string) + " " + day(string)+"\t")
 f.write((x['title'].encode('utf-8') ).replace("’","'").replace("'","'")+"\t")
 f.write(x['url']+"\n")
raw_input("Press enter to quit")

I’ll walk you through this code bit by bit.

The first two lines import modules, so we can use them in our program. We need urllib2 to make http requests and json to decode the json string we get from the NYT API.

Then there are three definitions, these will make it a little bit easier to work with dates. In the JSON objects, dates are just string like 20130523, which is the 23th day of the 5th month in 2013. The syntax d[0:4] takes the string stored in variable d and returns the first to the fourth character as a new string. In this example 2013.

The next line creates a new file, which we can ‘w’rite to.

The forloop then assigns offset the values 0,1 and 2.

We request data for every offset and decode the answer. The next forloop is a little more complicated. We basically assign every article that we currently have in our result to the variable x. This reads: for each value in the list of results, assign the value to x.

We then read the articles date, title and url and write it to the file we created earlier. We have to encode the title-string to UTF-8, and replace certain special characters, because python does not like the ‘-character ( apostrophe ) in unicode format ( which the json string currently is in)

Notice that we use “\t” to seperate data, url and title. This is so we can later use the resulting text-file as a table and for example load it into google fusion tables.

Also notice that the last line is NOT inside the for-loops anymore. It basically waits for the user to press enter, when the program runs. This way, we can print text on the screen, without having the window disappear instantly ( on windows )

 

After saving this code as a .py file, i ran it, and got the following output:

2013 05 08 Backing Grows for European Bank Plan http://www.nytimes.com/2013/05/08/business/global/08iht-euro08.html
2013 05 04 Euro Area Recession Is Expected to Deepen http://www.nytimes.com/2013/05/04/business/global/04iht-euro04.html
2013 05 01 In Continuing Sign of Weakness, Unemployment Hits New High in the Euro Zone http://www.nytimes.com/2013/05/01/business/global/european-unemployment-sets-another-record.html
2013 04 30 Italy's New Premier Puts Stimulus First http://www.nytimes.com/2013/04/30/world/europe/enrico-letta-italys-new-premier-puts-stimulus-first.html
2013 04 30 Italy's New Premier Puts Stimulus First http://query.nytimes.com/gst/fullpage.html?res=9E07E3DB1439F933A05757C0A9659D8B63
2013 04 28 Germans' Dominance Is Peak of a Long Climb http://www.nytimes.com/2013/04/28/sports/soccer/germanys-champions-league-dominance-is-peak-of-a-long-climb-back.html
2013 04 23 German Soccer Hero Faces Prison Amid Record Run and Scandal http://www.nytimes.com/2013/04/23/world/europe/bayern-president-faces-prison-in-tax-evasion-case.html
2013 04 23 MEMO FROM EUROPE; Shrinking Europe Military Spending Stirs Concern http://www.nytimes.com/2013/04/23/world/europe/europes-shrinking-military-spending-under-scrutiny.html
2013 04 19 Plan for Cyprus Bailout Wins Easy Approval in Germany http://www.nytimes.com/2013/04/19/business/global/german-lawmakers-back-cyprus-bailout.html
2013 04 16 Path to Growth Splits Europe, With Austerity A Central Issue http://www.nytimes.com/2013/04/16/business/global/europe-split-over-austerity-as-a-path-to-growth.html
2013 04 13 Bailout Terms Are Eased for Ireland and Portugal http://www.nytimes.com/2013/04/13/business/global/euro-zone-finance-ministers-gather-in-ireland.html
2013 04 12 OP-ED CONTRIBUTOR; Cybersecurity: A View From the Front http://www.nytimes.com/2013/04/12/opinion/global/cybersecurity-a-view-from-the-front.html
2013 04 11 Economies of France and Italy Are Risks to the Euro Zone, a Report Says http://www.nytimes.com/2013/04/11/business/global/italy-and-france-are-risks-to-euro-zone-report-says.html
2013 04 11 Hollande Creates a Prosecutor for Fraud and Vows to End Tax Havens http://query.nytimes.com/gst/fullpage.html?res=9C06EFDF1F3FF932A25757C0A9659D8B63
2013 04 11 Hollande Creates a Prosecutor for Fraud and Vows to End Tax Havens http://www.nytimes.com/2013/04/11/business/global/european-countries-move-to-toughen-stance-on-tax-evasion.html
2013 04 10 Europe's Data on Wealth Needs a Grain of Salt http://www.nytimes.com/2013/04/10/business/global/germans-are-poor-and-italians-are-frugal-huh.html
2013 04 07 FUNDAMENTALLY; Europe's Markets, No Longer in Lock Step http://query.nytimes.com/gst/fullpage.html?res=9D05E3DF1F3CF934A35757C0A9659D8B63
2013 04 07 FUNDAMENTALLY; Europe's Markets, No Longer in Lock Step http://www.nytimes.com/2013/04/07/your-money/europes-markets-no-longer-in-lock-step.html
2013 04 06 WEALTH MATTERS; Overseas Finances Can Trip Up Americans Abroad http://www.nytimes.com/2013/04/06/your-money/rules-aimed-at-tax-evasion-abroad-trip-up-average-americans.html
2013 04 02 INSIDE EUROPE; Germany Appoints Itself Parent to Restive Euro Children http://www.nytimes.com/2013/04/02/business/global/germany-appoints-itself-parent-to-restive-euro-children.html
Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s