Revision : July 25, 2016
Instagram has become one of the preferred and most used applications for users and organizations to instantly share media content with the world. Many Instagram users also want to take a step forward and share their Instagram content into their websites, web applications or blogs. In some other cases, Digital Agencies may also need to feature Instagram content from authors other than themselves.
In this case-study, we are going to learn how to build our own API (Application Programming Interface) to get embedded data from Instagram pages, and return it to a requesting (web) application in JSON format. Also, we will discuss the motivations and advantages of creating this self-hosted API versus Instagram's own API.
Instagram API
Instagram's own API allows users and developers alike to have open access to their data in order to share Instagram's content in the websites they create.
Before you can use it, you must first register an application and obtain a client ID and a client secret. Then your application can make requests to the API endpoints with the proper credentials.
The API allows users to have access to different type of data that can be used to develop specific web application. For instance, there are many third-party tools, widget generators and plugins that allows you to share Instagram content in your site. Some examples of these applications are snapwidget.com, websta.me or instafeed.
Another application example is the jelled lookup, which allows you to get any Instagram user ID by providing its Instagram user name.
Most of these tools and plugins (if not all) use Instagram's (RESTful) API and they all have their own advantages and limitations.
Why another API?
Despite the flexibility of Instagram's API or other third-party resources available, there may be some reasons you still don't want (or need) to use them :
- Your requirements don't need the complexity of Instagram's API
- You don't want to sign up and register an application
- You want to have access to Instagram data without authentication
- You only need to get basic statistical information from an Instagram's user, e.g. : ID, full name, biographic information, number of followers, number of following accounts, etc.
- You don't want to have the constrains of Instagram's API rate limits
- You don't want to rely on third-party (widget) services availability to share your content
- You don't even have an Instagram account but still you want or need to share media from other users *
Sharing data without registering an application.
Before we start with our own API, let's explore some of the resources available to share Instagram data without authentication or without registering an application.
The following Instagram API methods may be suitable for some scenarios :
1). /p/{media shortcode}/media/
This method returns the actual media location of an specific image by adding /media/ to any media's URL like http://instagram.com/p/{media shortcode}/media/?size=m
Supported values are t (thumbnail), m (medium), l (large). Defaults is m.
NOTE : As today, Instagram image sizes are as follow : t = 150 x 150 px m = 306 x 306 px l = 640 x 640 px
The advantage of this method is that we can use the returned location(s) within regular HTML tags. There is no need to add extra in-line code or scripts, e.g. :
<a href="http://instagram.com/p/w1x9gVhQdR/media/?size=m"> <img src="http://instagram.com/p/w1x9gVhQdR/media/?size=t" alt="thumbnail" /> </a>
The disadvantage is that it only works on individual media basis, therefore we may need to manually repeat the process for each media we want to share.
To overcome that limitation, we could automate the process with a little bit of javascript. We could place our collection of media shortcodes in a javascript array like :
var shortcodes = ["xhiFXqhQe8", "xag9OfOHOT", "xkBygQK20u", "xcNT9yk3BH"];
Then we could use a for loop to iterate through all the items in the array and render the proper HTML like
for (var i = 0; i < shortcodes.length; i++) { var item = '<a href="http://instagram.com/p/' + shortcodes[i] + '/media/?size=m" ><img src="http://instagram.com/p/' + shortcodes[i] + '/media/?size=t" alt="thumbnail" /></a>'; document.getElementById("container").innerHTML += item; }
See a DEMO of this implementation.
Other limitations to consider are :
- No additional data can be retrieved, e.g. author's name, user ID, etc.
- It only works for images but not for videos (we can only get the images(s) associated to a video though.)
2). oembed
The oembed method is another good alternative to use the Instagram API without requiring authentication. It returns JSON data from an specific media using the following URL format :
http://api.instagram.com/oembed?url=http://instagram.com/p/{media shortcode}/
Unlike the /p/{media shortcode}/media/ method, we can get more detailed information like author's name, ID, media location, media ID, caption, etc. It also gives us more flexibility while formatting the response within our HTML page.
We could easily get and manipulate each piece of data from the API's JSON response using jQuery.ajax() like :
jQuery(document).ready(function ($) { var URL = "http://api.instagram.com/oembed?url=http://instagram.com/p/xcNT9yk3BH"; $.ajax({ url: URL, dataType: "jsonp", // this is important to circumvent cross-domain issues cache: false, success: function (response) { var html = '<div class="container">'+ '<a href="' + response.thumbnail_url + '" >'+ '<img src="' + response.thumbnail_url + '" alt="thumbnail" /></a>'+ '<p>Author : ' + response.author_name + '<br />'+ 'Author\'s ID: ' + response.author_id + '<br />'+ 'Title : ' + response.title + '</p></div>'; $("#container").html(html); }, error: function () { $("#container").html("<p>There was an error in the ajax call</p>"); } }); }); // ready
See DEMO.
Since this also works on individual media basis like the /p/{media shortcode}/media/ method, we could also use a for loop to iterate through a collection of shortcodes inside an array.
See tweaked DEMO.
Bear in mind the oembed method has the main limitations :
- Starting on November 3rd, 2014, the JSON response doesn't indicate the type of media. It will always return type : "rich" instead of photo or video.
- Although we can get the actual URL of an image, we cannot get the actual URL of a video (the absolute path of a MP4 file.) This may be inconvenient if we wanted to play a video using our preferred (HTML5) video player.
These 2 previous methods are more suitable if you are only sharing a few Instagram media items in your page.
3). /{user}/media/
This method allows us to get the latest (20) media posts of an Instagram user by adding /media/ to the user's URL like : http://instagram.com/{user name}/media/
Like the oembed method, the link above will return a JSON response. The problem with this method is that the URL cannot be requested from an AJAX call without triggering a cross-domain error. Since it returns a JSON response, it cannot be processed as JSONP as we did it with the oembed method.
As a workaround, we may need to use a third-party proxy service like whateverorigin.org. This issue was previously addressed in this post.
You can see the workaround implementation using the whateverorigin proxy service.
NOTES :
The main advantage of this method is, unlike the methods previously covered, we only need a single AJAX request to get the latest (20) media posts. The main disadvantage is that it relies on a third-party application/service. If that service becomes unavailable, our implementation will fail.
Building our own API
There are 2 possible Instagram web pages from where our API can get relevant data and return it as JSON response to a requesting application :
- User page : http://instagram.com/{username}
- Media page : http://instagram.com/p/{shortcode}
Both type of pages have embedded data in their source code that is stored in a javascript variable. The value of that variable is a JSON-formatted javascript object. For instance, if you explore the source code of the Coca Cola Instagram page (or any other user), at the bottom of the page you will find a line like this :
<script type="text/javascript">window._sharedData = {"static_root":"\/\/instagramstatic-a.akamaihd.net\/bluebar\/ab9cf6a" .....etc.};</script>
What our API will do is :
- read the contents of the user or the media web page
- extract the value of the window._sharedData variable
- return the extracted data as JSON to the requesting application with the proper header information
Yes, our API can be considered as a proxy service between an Instagram web page and a requesting application, with the following advantages :
- There is not limit in the amount of requests you can make to the API since they will only count as page visits
- You can restrict what domains the API can serve to
- You can extend the response and return it as JSONP for cross-site availability
- You don't have to rely on any third-party service but your own server availability
- You can serve the API from yours or your clients' own server(s)
Requesting data from the API
Since we will be reading data from a user or a media (Instagram) web page, we need to tell the API the type of request we are doing. We can do this by adding a query string or trailing parameter to the request.
For instance, if we named api.php to our API file, the query string should look like :
api.php?user={user's URL}
or if we are requesting data from a media page :
api.php?media={media's URL}
Processing the request
Within our API file we will be processing two type of input requests :
$user = sanitize_input( $_GET['user'] ); // expects something like "instagram" (username) $media = sanitize_input( $_GET['media'] ); // expects something like "mOFsFhAp4f" (shortcode)
Since we will be accepting one type of request only, either user or media, we first need to check whether the request type is valid or not :
if( !empty($user) && empty($media) ){ // valid: requested user information, including last 20 media posts } elseif( empty($user) && !empty($media) ){ // valid: requested media information } elseif( !empty($user) && !empty($media) ){ // invalid: two or more parameters were passed } elseif( empty($user) && empty($media) ){ // invalid: no parameters or incorrect parameters were passed };
In order to read the contents of a user or media (web) page, we will use PHP's file_get_contents() function. This function will place the entire content of the (web) file into a string, including text and HTML tags, just like we could see it in the file's source code.
// set a timeout context $context = stream_context_create(array( 'http' => array( 'timeout' => 10 // in seconds ) ) );
For instance, if we request to read the contents of the user page, we could do :
$dataFile = @ file_get_contents("http://instagram.com/".$user);
or if we have set a context variable :
$dataFile = @ file_get_contents("http://instagram.com/".$user, NULL, $context);
Extracting the right data
After reading the file contents, we can echo the returned data from the process_data() function, IF (and if only) the request was valid :
echo process_data($dataFile, $requestType); // process data function process_data($dataFile, $requestType){ $data_length = strlen($dataFile); // validate if $dataFile didn't come empty if( $data_length > 0 ){ // $start_position = strpos( $dataFile ,'{"static_root"' ); // start position $start_position = strpos( $dataFile ,'window._sharedData = ' ); // the start position $start_positionlength = strlen('window._sharedData = '); // string length to trim before // $trimmed_before = trim( substr($dataFile, $start_position) ); // trim preceding content $trimmed_before = trim( substr($dataFile, ($start_position + $start_positionlength) ) ); // trim preceding content $end_position = strpos( $trimmed_before, '</script>'); // end position $trimmed = trim( substr( $trimmed_before, 0, $end_position) ); // trim content $jsondata = substr( $trimmed, 0, -1); // remove extra trailing ";" header("HTTP/1.0 200 OK"); header('Content-Type: application/json; charset=utf-8'); return $jsondata; } else { // $dataFile returned 0 header("HTTP/1.0 400 BAD REQUEST"); header('Content-Type: text/html; charset=utf-8'); die("invalid $requestType"); } };
The process_data() function validates if the passed $dataFile parameter is not coming empty. This could be the case if either the Instagram user or the media shortcode doesn't exist.
If the $dataFile string's length is bigger than 0, in other words it's not empty, it contains the file contents of the Instagram web page.
From here, we need to find the starting position of the sub-string we need to extract (the value of the window._sharedData variable) using PHP's strpos() function :
$start_position = strpos( $dataFile ,'window._sharedData = ' ); // the start position
Once we know the starting position of the sub-string, we need to get its length so we know what precendent content to trim
$start_positionlength = strlen('window._sharedData = '); // string length to trim before
Then we can trim any preceding content using PHP's substr() and trim() functions like :
$trimmed_before = trim( substr($dataFile, ($start_position + $start_positionlength) ) ); // trim preceding content
From here we can proceed to find the ending position of the string we want to trim. Since this string was set in a javascript variable, the ending position will be when we find the first occurrence of the script closing tag </script> :
$end_position = strpos( $trimmed_before, '</script>'); // end position
and then, we just need to trim the rest of unused content, starting from the beginning of the previously trimmed content up to the ending position like :
$trimmed = trim( substr( $trimmed_before, 0, $end_position) ); // trim content
We will use the substr() function again to trim the semicolon at the end of the sub-string :
$jsondata = substr( $trimmed, 0, -1); // remove trailing semicolon ";"
Now we can return the $jsondata (PHP) variable along with the proper headers :
header("HTTP/1.0 200 OK"); header('Content-Type: application/json; charset=utf-8'); return $jsondata;
If the $dataFile string's length is equal to 0, it means the request was invalid, so we can return an error message along with the corresponding bad request header :
// strlen($dataFile) returned "0" header("HTTP/1.0 400 BAD REQUEST"); header('Content-Type: text/html; charset=utf-8'); die("invalid $requestType");
Allowing cross-domain requests to the API
All AJAX calls are subject to the Same Origin Policy, which means that both, the requesting and the serving application must reside in the same domain. If the requesting application resides in another domain, it will receive a cross-origin error while requesting data from the API.
In most cases, our API will reside in the same domain of our requesting application, however there would be some cases when we would like to make the API accessible from other domain(s). If the second, we would need to enable CORS in our API file.
Allow access to all domains
If you want to provide your API as a (public) service and let any application to request data regardless where the API is hosted, you just need to place the following header at the top of your API file :
header("Access-Control-Allow-Origin: *"); // allows ALL domains
Restrict access to a single domain
Place the following header at the top of your API file to allow access to a single domain, other than the hosting domain :
$http_origin = $_SERVER['HTTP_ORIGIN']; // get request origin if ($http_origin == "http://another-domain.com"){ // set header if the origin matches the other allowed domain header("Access-Control-Allow-Origin: $http_origin"); }
This could be useful in case you are installing the API on your client's domain but you also want to make requests from your own domain application for testing purposes.
Allow access to a list of specific domains
If you want to grant access to a short list of domains or sub-domains, e.g. a list of selected clients' domains, jsfiddle, codepen, etc. you can create a simple array of those domains. Then grant access if the requesting domain is found in that array :
// create a simple array with the domain list $domains_allowed = array("http://www.picssel.com", "http://www.picssel.ca", "http://jsfiddle.net", "http://fiddle.jshell.net"); // get request origin $http_origin = $_SERVER['HTTP_ORIGIN']; // check if the requesting domain exists in the array and grant access to it if(in_array( $http_origin, $domains_allowed )){ header("Access-Control-Allow-Origin: $http_origin"); }
JSON vs JSONP
JSON with padding (JSONP) is another way to allow cross-domain calls from javascript browser-based clients to the API. JSONP bypasses the limitation enforced by most web browsers where access to the API must be in the same domain.
Bear in mind that for JSONP to work, our API needs to reply with a JSONP-formatted response. If the API only returns JSON-formatted data, the JSONP request won't work.
According to this performance test, JSON responses are faster than JSONP. You can decide whether your API will return a JSON-formatted or a JSONP-formatted response, or both.
To return a JSONP-formatted response we need :
- Check if a callback parameter was passed in the request
- Wrap the response in a javascript function
- Return the proper headers along with the function
Let's see again what was the API's original JSON response :
header("HTTP/1.0 200 OK"); header('Content-Type: application/json; charset=utf-8'); return $jsondata;
We would need to modify that piece of code to return a JSONP-formatted response or both like :
header("HTTP/1.0 200 OK"); // return either json or jsonp // jsonp if(array_key_exists('callback', $_GET)){ header('Content-Type: text/javascript; charset=utf8'); $callback = $_GET['callback']; return $callback."(".$jsondata.");"; } // response as json else { header('Content-Type: application/json; charset=utf-8'); return $jsondata; }
First, notice we used PHP's array_key_exists() function to check if the parameter callback was passed in the query string.
If so, we wrap the response in a (javascript) function, that is returned with the proper header. For instance, if we perform this request :
api.php?user=cocacola&callback=myFunction
It will return a JSONP-formatted response like :
myFunction({"static_root":"\/\/instagramstatic-a.akamaihd.net\/bluebar\/ab9cf6a" .....etc.});
The returned JSONP-formatted (javascript) object can be used the same way of any JSON response in our web application.
Putting all the pieces together
This would be the full code of our API (api.php) file :
$http_origin = $_SERVER['HTTP_ORIGIN']; /** restrict API to domain level **/ $domains_allowed = array("http://www.picssel.com", "http://www.picssel.ca"); if(in_array( $http_origin, $domains_allowed )){ header("Access-Control-Allow-Origin: $http_origin"); } /** functions **/ // sanitize input function sanitize_input($input){ $input = trim($input); $input = stripslashes($input); $input = strip_tags($input); $input = htmlspecialchars($input); return $input; }; // process data function process_data($dataFile, $requestType){ $data_length = strlen($dataFile); if( $data_length > 0 ){ // $start_position = strpos( $dataFile ,'{"static_root"' ); // start position $start_position = strpos( $dataFile ,'window._sharedData = ' ); // the start position $start_positionlength = strlen('window._sharedData = '); // string length to trim before // $trimmed_before = trim( substr($dataFile, $start_position) ); // trim preceding content $trimmed_before = trim( substr($dataFile, ($start_position + $start_positionlength) ) ); // trim preceding content $end_position = strpos( $trimmed_before, ''); // end position $trimmed = trim( substr( $trimmed_before, 0, $end_position) ); // trim content $jsondata = substr( $trimmed, 0, -1); // remove extra trailing ";" header("HTTP/1.0 200 OK"); // JSONP response if(array_key_exists('callback', $_GET)){ header('Content-Type: text/javascript; charset=utf8'); $callback = $_GET['callback']; return $callback."(".$jsondata.");"; } // JSON response else { header('Content-Type: application/json; charset=utf-8'); return $jsondata; } } else { // invalid username or media header("HTTP/1.0 400 BAD REQUEST"); header('Content-Type: text/html; charset=utf-8'); die("invalid $requestType"); } }; // process user input $user = sanitize_input( $_GET['user'] ); // instagram user name $media = sanitize_input( $_GET['media'] ); // media shortcode /***** set context *****/ $context = stream_context_create(array( 'http' => array( 'timeout' => 10 // in seconds ) ) ); /***** validate request type and return response *****/ // user, including last 20 media posts if( !empty($user) && empty($media) ){ $requestType = "user"; $dataFile = @ file_get_contents("http://instagram.com/".$user, NULL, $context); echo process_data($dataFile, $requestType); } // media elseif( empty($user) && !empty($media) ){ $requestType = "media"; $dataFile = @ file_get_contents("http://instagram.com/p/".$media, NULL, $context); echo process_data($dataFile, $requestType); } // invalid : two or more parameters were passed elseif( !empty($user) && !empty($media) ){ header("HTTP/1.0 400 BAD REQUEST"); header('Content-Type: text/html; charset=utf-8'); die("only one parameter allowed"); } // invalid : none or invalid parameters were passed elseif( empty($user) && empty($media) ){ header("HTTP/1.0 400 BAD REQUEST"); header('Content-Type: text/html; charset=utf-8'); die("invalid parameters"); };
Making API requests from a web application
Once our API file is ready, we can make requests to it from a web application. One of the easiest ways to do so is using jQuery.ajax() like :
var request = "api.php?user=picssel"; jQuery.ajax({ cache : false, dataType : "json", // or "jsonp" if we enabled it url : request, success : function (response) { // process response }, error : function (xhr, status, error) { // error handler } });
A working example
We can create a variety of rich web applications to take advantage of the API response.
For instance, we could create an application that requests the full path of an Instagram (MP4) video and play it in our web page using our favorite (HTML5) media player like JWPlayer, Mediaelement.js or FlowPlayer.
We could even create our own Instagram user-id lookup application like the jelled lookup, etc.
See a demo page that shows some of the possible applications for the API.
Last Notes
Bear in mind this API doesn't substitutes Instagram's own API in any way but it can be useful in some specific scenarios. The API also has its own limitations, for instance :
- We can only request the latest 20 media posts and we cannot use pagination like in the Instagram's API to request more than that.
- You also need to be familiar with JSON format to process the API's response so it's not a tool to use "out-of-the-box" like other existing third-party applications.
I think the main advantage of the API is that it offers you full-control of the process, where only the Instagram pages and your own server availability are required.
It would be interesting to know how you have created your own implementation based on this case-study so please feel free to share.
By the way, the code in this case-study is provided "as is" and it's only intended as a learning tool, and is not offered as a software plugin, application or web service. Also, we are not responsible for changes in Instagram policies or services that may affect the functionality of the API described here.
Download the code
For reference purposes, the complete API file, including the HTML, JS and CSS demo files are available at GitHub
Disclaimer
All trademarks, videos and images remain property of their respective holders, and are used here for demo purposes only.