Build a simple Instagram API – Case Study

Revision : July 25, 2016

Instagram has become one of the preferred and most used applications for users and organizations to instantly share media content with the world. Many Instagram users also want to take a step forward and share their Instagram content into their websites, web applications or blogs. In some other cases, Digital Agencies may also need to feature Instagram content from authors other than themselves.

In this case-study, we are going to learn how to build our own API (Application Programming Interface) to get embedded data from Instagram pages, and return it to a requesting (web) application in JSON format. Also, we will discuss the motivations and advantages of creating this self-hosted API versus Instagram's own API.

instagram featured image
IMPORTANT : This case-study assumes you are comfortable with PHP commands and syntax. It also assumes you are familiar with concepts such as API, JSON, RESTful and AJAX.
Instagram API

Instagram's own API allows users and developers alike to have open access to their data in order to share Instagram's content in the websites they create.

Before you can use it, you must first register an application and obtain a client ID and a client secret. Then your application can make requests to the API endpoints with the proper credentials.

The API allows users to have access to different type of data that can be used to develop specific web application. For instance, there are many third-party tools, widget generators and plugins that allows you to share Instagram content in your site. Some examples of these applications are snapwidget.com, websta.me or instafeed.

Another application example is the jelled lookup, which allows you to get any Instagram user ID by providing its Instagram user name.

Most of these tools and plugins (if not all) use Instagram's (RESTful) API and they all have their own advantages and limitations.

NOTE : We do not endorse nor we are affiliated to any of the resources mentioned above. They are just mentioned here for reference purposes.
Why another API?

Despite the flexibility of Instagram's API or other third-party resources available, there may be some reasons you still don't want (or need) to use them :

  • Your requirements don't need the complexity of Instagram's API
  • You don't want to sign up and register an application
  • You want to have access to Instagram data without authentication
  • You only need to get basic statistical information from an Instagram's user, e.g. : ID, full name, biographic information, number of followers, number of following accounts, etc.
  • You don't want to have the constrains of Instagram's API rate limits
  • You don't want to rely on third-party (widget) services availability to share your content
  • You don't even have an Instagram account but still you want or need to share media from other users *
IMPORTANT : This case-study focuses on having access to Instagram data without using Instagram API's endpoints, authentication or registering an application.
WARNING : Instagram users own their media. Make sure you have the right permissions before sharing other users' content.
Sharing data without registering an application.

Before we start with our own API, let's explore some of the resources available to share Instagram data without authentication or without registering an application.

The following Instagram API methods may be suitable for some scenarios :


1). /p/{media shortcode}/media/

This method returns the actual media location of an specific image by adding /media/ to any media's URL like http://instagram.com/p/{media shortcode}/media/?size=m

Supported values are t (thumbnail),  m (medium),  l (large). Defaults is m.

NOTE : As today, Instagram image sizes are as follow :

t = 150 x 150 px
m = 306 x 306 px
l = 640 x 640 px

The advantage of this method is that we can use the returned location(s) within regular HTML tags. There is no need to add extra in-line code or scripts, e.g. :

<a href="http://instagram.com/p/w1x9gVhQdR/media/?size=m">
    <img src="http://instagram.com/p/w1x9gVhQdR/media/?size=t" alt="thumbnail" />
</a>
thumbnail

The disadvantage is that it only works on individual media basis, therefore we may need to manually repeat the process for each media we want to share.

To overcome that limitation, we could automate the process with a little bit of javascript. We could place our collection of media shortcodes in a javascript array like :

var shortcodes = ["xhiFXqhQe8", "xag9OfOHOT", "xkBygQK20u", "xcNT9yk3BH"];

Then we could use a for loop to iterate through all the items in the array and render the proper HTML like

for (var i = 0; i < shortcodes.length; i++) {
    var item = '<a href="http://instagram.com/p/' + shortcodes[i] + '/media/?size=m" ><img src="http://instagram.com/p/' + shortcodes[i] + '/media/?size=t" alt="thumbnail" /></a>';
    document.getElementById("container").innerHTML += item;
}

See a DEMO of this implementation.

WARNING : This method is not supported by Twitter App's built-in web browser. You may need an extra step to get the actual media's headers location.

Other limitations to consider are :

  • No additional data can be retrieved, e.g. author's name, user ID, etc.
  • It only works for images but not for videos (we can only get the images(s) associated to a video though.)

2). oembed

The oembed method is another good alternative to use the Instagram API without requiring authentication. It returns JSON data from an specific media using the following URL format :

http://api.instagram.com/oembed?url=http://instagram.com/p/{media shortcode}/

Unlike the /p/{media shortcode}/media/ method, we can get more detailed information like author's name, ID, media location, media ID, caption, etc. It also gives us more flexibility while formatting the response within our HTML page.

We could easily get and manipulate each piece of data from the API's JSON response using jQuery.ajax() like :

jQuery(document).ready(function ($) {
    var URL = "http://api.instagram.com/oembed?url=http://instagram.com/p/xcNT9yk3BH";
    $.ajax({
        url: URL,
        dataType: "jsonp", // this is important to circumvent cross-domain issues
        cache: false,
        success: function (response) {
            var html =
                '<div class="container">'+
                '<a href="' + response.thumbnail_url + '" >'+
                '<img src="' + response.thumbnail_url + '" alt="thumbnail" /></a>'+
                '<p>Author : ' + response.author_name + '<br />'+
                'Author\'s ID: ' + response.author_id + '<br />'+
                'Title : ' + response.title + '</p></div>';
            $("#container").html(html);
        },
        error: function () {
            $("#container").html("<p>There was an error in the ajax call</p>");
        }
    });
}); // ready

See DEMO.

Since this also works on individual media basis like the /p/{media shortcode}/media/ method, we could also use a for loop to iterate through a collection of shortcodes inside an array.

See tweaked DEMO.

IMPORTANT : The API's response only provides the URL of the biggest image size available, which is 640 x 640 px. If you require smaller versions of an image, you still could use the /p/{media shortcode}/media/?size= method described above instead of down-scale the returned image via CSS.

Bear in mind the oembed method has the main limitations :

  • Starting on November 3rd, 2014, the JSON response doesn't indicate the type of media. It will always return type : "rich" instead of photo or video.
  • Although we can get the actual URL of an image, we cannot get the actual URL of a video (the absolute path of a MP4 file.) This may be inconvenient if we wanted to play a video using our preferred (HTML5) video player.

These 2 previous methods are more suitable if you are only sharing a few Instagram media items in your page.


3). /{user}/media/

This method allows us to get the latest (20) media posts of an Instagram user by adding /media/ to the user's URL like : http://instagram.com/{user name}/media/

Like the oembed method, the link above will return a JSON response. The problem with this method is that the URL cannot be requested from an AJAX call without triggering a cross-domain error. Since it returns a JSON response, it cannot be processed as JSONP as we did it with the oembed method.

As a workaround, we may need to use a third-party proxy service like whateverorigin.org. This issue was previously addressed in this post.

You can see the workaround implementation using the whateverorigin proxy service.

NOTES :

The main advantage of this method is, unlike the methods previously covered, we only need a single AJAX request to get the latest (20) media posts. The main disadvantage is that it relies on a third-party application/service. If that service becomes unavailable, our implementation will fail.

Building our own API

There are 2 possible Instagram web pages from where our API can get relevant data and return it as JSON response to a requesting application :

  1. User page : http://instagram.com/{username}
  2. Media page : http://instagram.com/p/{shortcode}

Both type of pages have embedded data in their source code that is stored in a javascript variable. The value of that variable is a JSON-formatted javascript object. For instance, if you explore the source code of the Coca Cola Instagram page (or any other user), at the bottom of the page you will find a line like this :

<script type="text/javascript">window._sharedData = {"static_root":"\/\/instagramstatic-a.akamaihd.net\/bluebar\/ab9cf6a" .....etc.};</script>

What our API will do is :

  1. read the contents of the user or the media web page
  2. extract the value of the window._sharedData variable
  3. return the extracted data as JSON to the requesting application with the proper header information

Yes, our API can be considered as a proxy service between an Instagram web page and a requesting application, with the following advantages :

  • There is not limit in the amount of requests you can make to the API since they will only count as page visits
  • You can restrict what domains the API can serve to
  • You can extend the response and return it as JSONP for cross-site availability
  • You don't have to rely on any third-party service but your own server availability
  • You can serve the API from yours or your clients' own server(s)
IMPORTANT : The API is built in PHP. You require a server that supports PHP 4.x or 5.x to install it.
Requesting data from the API

Since we will be reading data from a user or a media (Instagram) web page, we need to tell the API the type of request we are doing. We can do this by adding a query string or trailing parameter to the request.

For instance, if we named api.php to our API file, the query string should look like :

api.php?user={user's URL}

or if we are requesting data from a media page :

api.php?media={media's URL}
HINT : We could pass the full URL of a user or media page to the API like api.php?user=http://instagram.com/cocacola, or simply pass the username or shortcode and let the API to process the corresponding full URL. We will be doing the latest in our case-study.
Processing the request

Within our API file we will be processing two type of input requests :

$user  = sanitize_input( $_GET['user'] );  // expects something like "instagram" (username)
$media = sanitize_input( $_GET['media'] ); // expects something like "mOFsFhAp4f" (shortcode)
HINT : Don't trust the user. Always sanitize any user's manual input.

Since we will be accepting one type of request only, either user or media, we first need to check whether the request type is valid or not :

if( !empty($user) && empty($media) ){
    // valid: requested user information, including last 20 media posts
} elseif( empty($user) && !empty($media) ){
    // valid: requested media information
} elseif( !empty($user) && !empty($media) ){
    // invalid: two or more parameters were passed
} elseif( empty($user) && empty($media) ){
    // invalid: no parameters or incorrect parameters were passed
};

In order to read the contents of a user or media (web) page, we will use PHP's file_get_contents() function. This function will place the entire content of the (web) file into a string, including text and HTML tags, just like we could see it in the file's source code.

HINT : Since all Instagram pages are returned as HTTPS, it's advisable to set a timeout environment context in case there is some delay in the page response. We can use PHP's stream_context_create() to set this timeout on the fly like :

// set a timeout context
$context = stream_context_create(array(
    'http' => array(
        'timeout' => 10 // in seconds
        )
    )
);

For instance, if we request to read the contents of the user page, we could do :

$dataFile = @ file_get_contents("http://instagram.com/".$user);

or if we have set a context variable :

$dataFile = @ file_get_contents("http://instagram.com/".$user,  NULL, $context);
Extracting the right data

After reading the file contents, we can echo the returned data from the process_data() function, IF (and if only) the request was valid :

echo process_data($dataFile, $requestType);

// process data
function process_data($dataFile, $requestType){
    $data_length = strlen($dataFile); // validate if $dataFile didn't come empty
    if( $data_length > 0 ){
        // $start_position = strpos( $dataFile ,'{"static_root"' ); // start position
        $start_position = strpos( $dataFile ,'window._sharedData = ' ); // the start position
        $start_positionlength = strlen('window._sharedData = '); // string length to trim before
        // $trimmed_before = trim( substr($dataFile, $start_position) ); // trim preceding content 
        $trimmed_before = trim( substr($dataFile, ($start_position + $start_positionlength) ) ); // trim preceding content
        $end_position = strpos( $trimmed_before, '</script>'); // end position
        $trimmed = trim( substr( $trimmed_before, 0, $end_position) ); // trim content
        $jsondata = substr( $trimmed, 0, -1); // remove extra trailing ";" 
        header("HTTP/1.0 200 OK");
        header('Content-Type: application/json; charset=utf-8');
        return $jsondata;
    } else {
        // $dataFile returned 0
        header("HTTP/1.0 400 BAD REQUEST");
        header('Content-Type: text/html; charset=utf-8');
        die("invalid $requestType");
    }
};

The process_data() function validates if the passed $dataFile parameter is not coming empty. This could be the case if either the Instagram user or the media shortcode doesn't exist.

If the $dataFile string's length is bigger than 0, in other words it's not empty, it contains the file contents of the Instagram web page.

From here, we need to find the starting position of the sub-string we need to extract (the value of the window._sharedData variable) using PHP's strpos() function :

$start_position = strpos( $dataFile ,'window._sharedData = ' ); // the start position

Once we know the starting position of the sub-string, we need to get its length so we know what precendent content to trim

$start_positionlength = strlen('window._sharedData = '); // string length to trim before

Then we can trim any preceding content using PHP's substr() and trim() functions like :

$trimmed_before = trim( substr($dataFile, ($start_position + $start_positionlength) ) ); // trim preceding content

From here we can proceed to find the ending position of the string we want to trim. Since this string was set in a javascript variable, the ending position will be when we find the first occurrence of the script closing tag </script> :

$end_position = strpos( $trimmed_before, '</script>'); // end position

and then, we just need to trim the rest of unused content, starting from the beginning of the previously trimmed content up to the ending position like :

$trimmed = trim( substr( $trimmed_before, 0, $end_position) ); // trim content
IMPORTANT : Since we are extracting the value of the window._sharedData = { ... }; variable, the trailing semicolon ; after the } closing bracket, will invalidate our JSON output, therefore we also need to trim it.

We will use the substr() function again to trim the semicolon at the end of the sub-string :

$jsondata = substr( $trimmed, 0, -1); // remove trailing semicolon  ";"

Now we can return the $jsondata (PHP) variable along with the proper headers :

header("HTTP/1.0 200 OK");
header('Content-Type: application/json; charset=utf-8');
return $jsondata;

If the $dataFile string's length is equal to 0, it means the request was invalid, so we can return an error message along with the corresponding bad request header :

// strlen($dataFile) returned "0"
header("HTTP/1.0 400 BAD REQUEST");
header('Content-Type: text/html; charset=utf-8');
die("invalid $requestType");
Allowing cross-domain requests to the API

All AJAX calls are subject to the Same Origin Policy, which means that both, the requesting and the serving application must reside in the same domain. If the requesting application resides in another domain, it will receive a cross-origin error while requesting data from the API.

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://www.domain.com/api.php?request=input. This can be fixed by moving the resource to the same domain or enabling CORS.

In most cases, our API will reside in the same domain of our requesting application, however there would be some cases when we would like to make the API accessible from other domain(s). If the second, we would need to enable CORS in our API file.

Allow access to all domains

If you want to provide your API as a (public) service and let any application to request data regardless where the API is hosted, you just need to place the following header at the top of your API file :

header("Access-Control-Allow-Origin: *"); // allows ALL domains
HINT : API requests may impact your site/server bandwidth consumption. Don't enable this method if you have hosting constrains or bandwidth's usage/costs related issues.
Restrict access to a single domain

Place the following header at the top of your API file to allow access to a single domain, other than the hosting domain :

$http_origin = $_SERVER['HTTP_ORIGIN']; // get request origin
if ($http_origin == "http://another-domain.com"){
    // set header if the origin matches the other allowed domain
    header("Access-Control-Allow-Origin: $http_origin");
}
HINT : You don't need to set this header if the requesting application and the API reside in the same domain, however you could grant access to another single requesting application residing in another domain.

This could be useful in case you are installing the API on your client's domain but you also want to make requests from your own domain application for testing purposes.

Allow access to a list of specific domains

If you want to grant access to a short list of domains or sub-domains, e.g. a list of selected clients' domains, jsfiddle, codepen, etc. you can create a simple array of those domains. Then grant access if the requesting domain is found in that array :

// create a simple array with the domain list
$domains_allowed = array("http://www.picssel.com", "http://www.picssel.ca", "http://jsfiddle.net", "http://fiddle.jshell.net");
// get request origin
$http_origin = $_SERVER['HTTP_ORIGIN'];
// check if the requesting domain exists in the array and grant access to it
if(in_array( $http_origin, $domains_allowed )){
    header("Access-Control-Allow-Origin: $http_origin");
}
JSON vs JSONP

JSON with padding (JSONP) is another way to allow cross-domain calls from javascript browser-based clients to the API. JSONP bypasses the limitation enforced by most web browsers where access to the API must be in the same domain.

Bear in mind that for JSONP to work, our API needs to reply with a JSONP-formatted response. If the API only returns JSON-formatted data, the JSONP request won't work.

According to this performance test, JSON responses are faster than JSONP. You can decide whether your API will return a JSON-formatted or a JSONP-formatted response, or both.

To return a JSONP-formatted response we need :

  • Check if a callback parameter was passed in the request
  • Wrap the response in a javascript function
  • Return the proper headers along with the function

Let's see again what was the API's original JSON response :

header("HTTP/1.0 200 OK");
header('Content-Type: application/json; charset=utf-8');
return $jsondata;

We would need to modify that piece of code to return a JSONP-formatted response or both like :

header("HTTP/1.0 200 OK");
// return either json or jsonp
// jsonp
if(array_key_exists('callback', $_GET)){
    header('Content-Type: text/javascript; charset=utf8');
    $callback = $_GET['callback'];
    return $callback."(".$jsondata.");";
}
// response as json
else {
    header('Content-Type: application/json; charset=utf-8');
    return $jsondata;
}
REMINDER : We can only make JSON requests from a same origin application, or from an authorized domain if we have enabled CORS. We can only make JSONP requests if our API knows how to reply with a JSONP response.

First, notice we used PHP's array_key_exists() function to check if the parameter callback was passed in the query string.

If so, we wrap the response in a (javascript) function, that is returned with the proper header. For instance, if we perform this request :

api.php?user=cocacola&callback=myFunction

It will return a JSONP-formatted response like :

myFunction({"static_root":"\/\/instagramstatic-a.akamaihd.net\/bluebar\/ab9cf6a" .....etc.});

The returned JSONP-formatted (javascript) object can be used the same way of any JSON response in our web application.

Putting all the pieces together

This would be the full code of our API (api.php) file :

$http_origin = $_SERVER['HTTP_ORIGIN'];
/** restrict API to domain level **/
$domains_allowed = array("http://www.picssel.com", "http://www.picssel.ca");
if(in_array( $http_origin, $domains_allowed )){
  header("Access-Control-Allow-Origin: $http_origin");
}

/** functions **/
// sanitize input
function sanitize_input($input){
    $input = trim($input);
    $input = stripslashes($input);
    $input = strip_tags($input);
    $input = htmlspecialchars($input);
    return $input;
};
// process data
function process_data($dataFile, $requestType){
    $data_length = strlen($dataFile);
    if( $data_length > 0 ){
        // $start_position = strpos( $dataFile ,'{"static_root"' ); // start position
        $start_position = strpos( $dataFile ,'window._sharedData = ' ); // the start position
        $start_positionlength = strlen('window._sharedData = '); // string length to trim before
        // $trimmed_before = trim( substr($dataFile, $start_position) ); // trim preceding content
        $trimmed_before = trim( substr($dataFile, ($start_position + $start_positionlength) ) ); // trim preceding content
        $end_position = strpos( $trimmed_before, ''); // end position
        $trimmed = trim( substr( $trimmed_before, 0, $end_position) ); // trim content
        $jsondata = substr( $trimmed, 0, -1); // remove extra trailing ";"
        header("HTTP/1.0 200 OK");
        // JSONP response
        if(array_key_exists('callback', $_GET)){
            header('Content-Type: text/javascript; charset=utf8');
            $callback = $_GET['callback'];
            return $callback."(".$jsondata.");";
        }
        // JSON response
        else {
            header('Content-Type: application/json; charset=utf-8');
            return $jsondata;
        }
    } else {
        // invalid username or media
        header("HTTP/1.0 400 BAD REQUEST");
        header('Content-Type: text/html; charset=utf-8');
        die("invalid $requestType");
    }
};

// process user input
$user  = sanitize_input( $_GET['user'] ); // instagram user name
$media = sanitize_input( $_GET['media'] ); // media shortcode

/***** set context *****/
$context = stream_context_create(array(
    'http' => array(
        'timeout' => 10 // in seconds
        )
    )
); 

/***** validate request type and return response *****/
// user, including last 20 media posts
if( !empty($user) && empty($media) ){
    $requestType = "user";
    $dataFile = @ file_get_contents("http://instagram.com/".$user,  NULL, $context);
    echo process_data($dataFile, $requestType);
}
// media
elseif( empty($user) && !empty($media) ){
    $requestType = "media";
    $dataFile = @ file_get_contents("http://instagram.com/p/".$media, NULL, $context);
    echo process_data($dataFile, $requestType);
}
// invalid : two or more parameters were passed
elseif( !empty($user) && !empty($media) ){
    header("HTTP/1.0 400 BAD REQUEST");
    header('Content-Type: text/html; charset=utf-8');
    die("only one parameter allowed");
}
// invalid : none or invalid parameters were passed
elseif( empty($user) && empty($media) ){
    header("HTTP/1.0 400 BAD REQUEST");
    header('Content-Type: text/html; charset=utf-8');
    die("invalid parameters");
};
Making API requests from a web application

Once our API file is ready, we can make requests to it from a web application. One of the easiest ways to do so is using jQuery.ajax() like :

var request = "api.php?user=picssel";
jQuery.ajax({
    cache : false,
    dataType : "json", // or "jsonp" if we enabled it
    url : request,
    success : function (response) {
        // process response
    },
    error : function (xhr, status, error) {
        // error handler
    }
});
HINT : If you are making a dataType: "jsonp" jQuery AJAX request, you don't need to add any callback parameter to the query string since jQuery.ajax() adds it by default, unless you want to override the callback function name. In that case you may need to add the jsonp and jsonpCallback settings. Refer to jQuery.ajax() documentation for further reference.
A working example

We can create a variety of rich web applications to take advantage of the API response.

For instance, we could create an application that requests the full path of an Instagram (MP4) video and play it in our web page using our favorite (HTML5) media player like JWPlayer, Mediaelement.js or FlowPlayer.

We could even create our own Instagram user-id lookup application like the jelled lookup, etc.

See a demo page that shows some of the possible applications for the API.

DEMO

IMPORTANT : The amount of data we can collected from an Instagram user or media depends on whether the user profile is public or private.
Last Notes

Bear in mind this API doesn't substitutes Instagram's own API in any way but it can be useful in some specific scenarios. The API also has its own limitations, for instance :

  • We can only request the latest 20 media posts and we cannot use pagination like in the Instagram's API to request more than that.
  • You also need to be familiar with JSON format to process the API's response so it's not a tool to use "out-of-the-box" like other existing third-party applications.

I think the main advantage of the API is that it offers you full-control of the process, where only the Instagram pages and your own server availability are required.

It would be interesting to know how you have created your own implementation based on this case-study so please feel free to share.

By the way, the code in this case-study is provided "as is" and it's only intended as a learning tool, and is not offered as a software plugin, application or web service. Also, we are not responsible for changes in Instagram policies or services that may affect the functionality of the API described here.

Download the code

For reference purposes, the complete API file, including the HTML, JS and CSS demo files are available at GitHub

Disclaimer

All trademarks, videos and images remain property of their respective holders, and are used here for demo purposes only.