A Developer’s Guide to Perform SEO on AngularJS Web Apps

A Developer’s Guide to Perform SEO on AngularJS Web Apps

Before May 23, 2014, if you would have asked me to create a search-engine-bot-crawlable JavaScript website for you, I would have tried to discourage you, vehemently. Now if you would ask me, I would tell you that that’s the way to go.
Just about all JavaScript MVC frameworks, that include AngularJS, modify the inner content of your HTML structure. This used to make the pre-rendered HTML difficult to index for search engines. However with the advancement of technology, Google and other search engines are understanding webpages better. Bot-Crawling of JavaScript, well simple JS to be precise, is no longer a major issue, and the content of more and more web apps is being indexed by search engines.
That’s awesome news for webmasters, however Google itself advises to be on cautious side.

”It’s always a good idea to have your site degrade gracefully. This will help users enjoy your content even if their browser doesn’t have compatible JavaScript implementations. It will also help visitors with JavaScript disabled or off, as well as search engines that can’t execute JavaScript yet.”Google.

So it’s still not time to abandon age old tricks of making JavaScript rendered content search engine optimized. There are many ways that old webmasters use to embed full SEO support for AngularJS and other application. But according to me the best method of making JS SEO friendly is by using special URL routing and creating a headless browser to automatically retrieve the HTML.

Getting Your AngularJS Apps Indexed

Though Google indexes your content automatically, you can tweak your content rendering properties in such a way that Google Bots index your content exactly the way you want. One of the simplest technique to accomplish this by serving your Angular JS content through a custom backend server.

Modern search engines and client-side apps URL

To ease the job of indexing web-app content, Google and other search engines have given a feature of hashbang URL format to web-masters. Whenever the search engine encounter a hashbag URL, i.e. a URL containing #!’ it converts it into ?_escaped_fragment_= URL where it would find full rendered HTML content ready to be indexed.

So for example, Google will turn the hashbang URL from:

http://www.example.com/#!/page/content

Into the URL:
http://www.example.com/?_escaped_fragment_=/page/content

At the second URL, which is not originally displayed to the website visitors by the way, the search engine will find non-JS content which would be easy to index.
Now the next step is to make your application intelligent enough so that when search engine bot queries the second URL, your server should return the necessary HTML snapshots of the page. So you need to setup the following special URL rerouting/rewriting for your application.

RewriteEngine On
RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/?(.*)$
RewriteRule ^(.*)$ /snapshots/%1? [NC,L]

Here you would notice that we have setup a special snapshot directory as the redirect URL. This directory will contain your HTML snapshots of your corresponding app pages. You can setup your own directory and make the changes in the accordingly.

The next problem to tackle is instructing AngularJS to use hashbangs. Angular by default churn out URL’s with only # instead of #!. To make Angular do so just add the following module as a dependency withing primary Angular modules

angular.module('HashBangURLs', []).config(['$locationProvider', function($location) {
$location.hashPrefix('!');
}]);

A Developer’s Guide to Perform SEO on AngularJS Web Apps

Creating HTML5 routing modes instead of Hashbangs

Did we mention that HTML5 is awesome? Well it is. So along with the Hashbang technique that we mentioned above, HTML5 and AngularJS combination gives us one more hack to trick search engines into parsing ?_escaped_fragment_ URLs, without actually using Hashbang URLs.
To do that first you have to instruct Google that we are actually using AJAX content, and the bot should visit the same URL using _escaped_fragment_ syntax. You can do that by including the following meta in your HTML code.

<meta name="fragment" content="!">

Then we would have to configure AngularJS so that it uses HTML5 URLs whenever and where ever it had to handle URLs and routing. You can do that by adding the following AnglarJS module to your code

angular.module('HTML5ModeURLs', []).config(['$routeProvider', function($route)
{
$route.html5Mode(true);
}]);

Handling SEO from the server-side using ExpressJS

In our previous posts we talked about awesomeness of ExpressJS as our server side JavaScript/nodeJS framework. Well you can also use ExpressJS for our server-side rerouting instead of Apache.
To make your ExpressJS framework deliver static HTML, we will first have to setup a middleware that will look for _escaped_fragment_ in our input URLs. Once found it will instantly serve HTML snapshots.

// In our app.js configuration
app.use(function(req, res, next)
{
var fragment = req.query._escaped_fragment_;
// If there is no fragment in the query params
// then we're not serving a crawler
if (!fragment) return next();

// If the fragment is empty, serve the
// index page

if (fragment === "" || fragment === "/")
fragment = "/index.html";

// If fragment does not start with '/'
// prepend it to our fragment
if (fragment.charAt(0) !== "/")
fragment = '/' + fragment;

// If fragment does not end with '.html'
// append it to the fragment
if (fragment.indexOf('.html') == -1) fragment += ".html";

// Serve the static html snapshot
try {
var file = __dirname + "/snapshots" + fragment; res.sendfile(file);
} catch (err) { res.send(404);}
});

Once again we have setup our snapshots in a top level directory named ‘/snapshot’. The ExpressJS also takes into account the possibility that search-engine-bot rendered URL does not have simple syntax features such as ‘/’ or ‘.html’, and thus provide the correct part to the bot.

Taking snapshots Using Node.JS

There are a lot of tools available in the market that you can use to take HTML snapshots of your web app, out of which Zombie.JS and Phantom.JS are the most used ones. These snapshots are what our would return when Google requests a URL with _escaped_fragment_ query.

The idea behind PhantomJS and even ZombieJS is to create a headless browser that access the regular URL of your web-app page, grabs the rendered HTML content when its fully executed and then returns the final HTML in a temporary file.
There are a lot of resources out there that you can guide you on how to do that perfectly by your self such as

So we are not going into detail on this. However we would certainly like to highlight a open-source tool that you use to take your HTML snapshots, Prerender.IO . You can use it as service or you can install it in your own server as the project is open source and available on GitHub

However what is even easier than that is a tool called Grunt-html-snapshot, and guess where you can found it in, Node.JS.

NodeJS comes pre-packed with Grunt tool and you can easily use it to create you own screen-shots hassle free. Here are the steps to setup grunt tool and start churning out HTML

    • First install NodeJS. You can download it from http://nodejs.org. Along with node also install npm (node package manager). For Mac and Windows users NodeJS comes as click and install applications. Ubuntu users would have to extract the tar.gz file and then install it from command terminal. Those with latest Ubuntu can also install using sudo apt-get install nodejs nodejs-dev npm command. Npm comes equipped with Grunt
    • Open your command console and navigate to your project folder.
    • To install Grunt tool globally, run command: npm install -g grunt-cli
    • You can also install a local copy of Grunt and its essential HTML-snapshot feature using the command npm install grunt-html-snapshot –save-dev
    • The next step is to create you own grunt javascript file Gruntfile.js. The JS file will have following code
module.exports = function(grunt) {
grunt.loadNpmTasks('grunt-html-snapshot');

grunt.initConfig({
htmlSnapshot: {
all: {
options: {
snapshotPath: '/project/snapshots/',
sitePath: 'http://example.com/my-website/',
urls: ['#!/page1', '#!/page2', '#!/page3']
sanitize: function (requestUri) {
//returns 'index.html' if the url is '/', otherwise a prefix
if (//$/.test(requestUri)) {
return 'index.html';
} else {
return requestUri.replace(///g, 'prefix-');
}
},
//if you would rather not keep the script tags in the html snapshots
//set `removeScripts` to true. It's false by default
removeScripts: true,
}
}
}
});

grunt.registerTask('default', ['htmlSnapshot']);
};

  • Once you have done that you can run the task using the command grunt htmlSnapshot

Grunt tool has some more features that we have skipped here. You can know more about them at grunt-html-snapshot page. You would also notice that we are giving the path to the web-app page in the task, so for it to work properly you need to first setup your website on the server and then point the task to the correct URLs. Also the snapshot here are stored automatically at the path /project/snapshots/, you can change it as per your requirement.

Site maps are also Important

For a finer control over how search engine bots access your site you need to fine-tune your site map as well. Whenever a search engine bot finds example.com/sitemap.xml, it follows the links given in the sitemap before following blindly all the links of the website. This is the best way if you want to index a page that is not linked to any other page, like mailer campaign landing pages, though this practice is frowned upon.
For AJAX content, its best to list all the pages/URLs that your app generates so that search engines indexes them properly, even if your app is a single page app. Here’s a sample sitemap

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
...
<url>
<loc>http://www.yourwebsite.com/#!/page1</loc>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>http://www.yourwebsite.com/#!/page2</loc>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>http://www.yourwebsite.com/#!/page3</loc>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
...
</urlset>

AngularJS Awesomeness

With the hurdle of non-indexability out of the way, there is no reason why you cannot create your whole web pages using JavaScript. People are already relying heavily on JS and the trend is not going to stop. Earlier the major concern was HTML, but now with the solution of AJAX indexed content, you can do just about anything. Go Fly.

The following two tabs change content below.
Rachit Agarwal

Rachit Agarwal

Director and Co-Founder at Algoworks Technologies
Rachit is leading the mobility business development function, mobility strategy and consulting practice at Algoworks. He is an expert of all mobile technologies and has experience in managing teams involved in the development of custom iPhone/iPad/Android apps.
Rachit Agarwal

Latest posts by Rachit Agarwal (see all)

Rachit AgarwalA Developer’s Guide to Perform SEO on AngularJS Web Apps
  • http://rgrillo.com/ Rafael Grillo

    This approach is very interesting.

    I have a solution for this that has a pretty similar approach but instead of using a headless browser to generate the snapshots I use the user’s navigation.

    Check it out

    https://github.com/grillorafael/bigseo

  • emarinizquierdo

    It is also important allowing to sharing crawlers as facebook, google plus… read in to your application content. An example of this is configure your mod_rewrite rules.

    • muditsingh5000

      Allow

  • Miguel Guerreiro

    This is all great.. but to developers that need to have their client websites housed in external hosts with CPanel it’s not easy or impossible to install Node or other snapshot software… or it’s a war with the hostmaster..
    And for example, in website where i have several links that set filters to a product list, those links i make them in Angularjs ng-click that calls a Ajax json request and refresh the product list with the new json data.. how that link is made to the SEO engine to this or what i do in the function that calls the json ajax?

    Thx for the good article :)

  • AlgoworksTech

    Hi Miguel

    For situations when we have no control over server, a 3rd party indexing service can be utilized like prerender.io or brombone.com to facilitate crawling instead of using snapshot software. They are pretty good too.

    ” in website where i have several links that set filters to a product list, those links i make them in Angularjs ng-click that calls a Ajax json request and refresh the product list with the new json data.. how that link is made to the SEO engine to this or what i do in the function that calls the json ajax?”

    Angular SEO is done by rendering the full page and caching HTML and that works fine until you start using ng-click events on the links, which requires triggering js functionality in the webpage, so the problem here is that when the page that has pagination or filter is getting rendered – HTML for page links won’t have any links, and won’t be crawled by search engines.

    Its not really AngularJS issue. For SEO if you use ng-href to build your links instead of click events then it will work fine for SEO.

    If you build regular html page and in links set onclick=”window.href=….” – they won’t be crawled either…

    An example would be to have following approach while defining such components:

    having a href fallback for products. In this case ng-click will be used for the functionality and ng-href will be crawled by SEO.

    a ng-click=”filterProduct{{product.size}}” href=”filter/product-size”

    • Miguel Guerreiro

      So, the ng-click does the job, and the href is there only for the crawler?
      doesn’t the href refreshes the page with the link at the same time ng-click executes the angular function?

      • AlgoworksTech

        We can restrict this by the below code:

        Make a directive:

        app.directive(‘a’, function() {
        return {
        restrict: ‘E’,
        link: function(scope, elem, attrs) {
        if(attrs.ngClick || attrs.href === ”something” || attrs.href === “#anything”){
        elem.on(‘click’, function(e){
        e.preventDefault();
        });
        }
        }
        };
        });

        Also if you are having a specific problem with your project you can contact us for a more detailed analysis of your project.

        • Miguel Guerreiro

          Thank you very much :)
          This is a very good blog!

  • Cloudgate313

    npm install grunt-html-snapshot –save-dev should be npm install grunt-html-snapshot -–save-dev double dashes before save “–“

    • AlgoworksTech

      Yep you are right.
      Changed to “–”

      Thanks

  • accman

    I have developed a very easy and simple solution to make ajax content available to crawler. It creates HTML snapshots and serves them to the google crawler / bot, all automatically. No need to modify any server configuration file, no need to proxy your content to 3rd party server (security risk) and no need to install phantomjs etc. It is just few lines of simple script that you can paste it in your index file and a tiny php file (does not contain any out side link) that you have to upload to your server. No link to any outside server or any thing else, because it will be part of you website. I will soon make it available for download. You copy and paste it, and your ajax loaded content will be ready to crawl by good bot. Will be available soon to download…

    • http://www.algoworks.com/ Algoworks

      Your solution seems interesting. We are waiting for it..

      • accman

        Thanks, I am preparing simple documentation for it. It is so simple that does not need much of documentation … :)

        • Ajay Sevalkar

          Your solution seems interesting. We are waiting for it..

          • http://facefore.com accman

            Hi Ajay,
            It is available now at facefore.com. You can check.

  • http://facefore.com accman

    You can get the free PHP script that generates XML sitemap and Seo friendly Html internal links automatically, by scaning the given folder for Html pages. Its a short and simple script you can copy n save it in a PHP file, then upload that file to your website root. Get it here :
    http://facefore.com/Generating-Html-links-and-XML-sitemap.html

  • asma

    is there any copy that we can try it befory payment ??