Jono Alderson

What’s the big deal about semantic HTML?

One of the things that I’m passionate about from an SEO perspective is the idea of constantly aspiring towards technical perfection.

Obviously, this is something of a pipe-dream; there aren’t many real-world, commercial scenarios in which it’s practical, achievable or even sensible to invest in the technical foundations of a website beyond a certain point. Time, budget and resource limitations (and a healthy dose of common sense) will always result in ending up with a finished product that “could have been better” from an SEO perspective – I’m sure that there’s a pretty much unlimited ceiling for the amount of extra bells and whistles we’d­ all like to see included in every project we work on! However, even if you’re in the development phase of a new project (when there’s the maximum flexibility and appetite for technical investment and development), requests and requirements for additional or ‘nice to have’ SEO features and functionality inevitably get pushed down the priority list in the name of pragmatism.

The position, then, that we often find ourselves in is that we need to identify small, tactical changes and improvements to a site which will result in a worthwhile ROI. What’s often missed is that there’s generally a huge amount of opportunity to take the technical platform you’ve already got and to make it work much harder – without requiring significant amounts of extra development, or resulting in you giving up and looking for easy wins off-site or in other areas.

There are two really easy wins here – the use of semantic HTML in every-day content and template structures, and the tactical application of microdata in the form of Schema markup.

This, then, is a summary of the talk I gave at searchABLE.1 in Leeds, mid-march 2012, on those very topics.*

*Produced and delivered the day before this great SEOmoz/YOUmoz post was published; honest!

To set the scene, I’d like to examine WordPress. As a platform, WordPress has a great reputation within the SEO community because it gets a lot of things right. Out of the box, it provides enough flexible, robust functionality to ensure that most of the SEO basics are tackled right away, and those that aren’t can be easily customised or enhanced through the application of plugins and modifications. In particular, one of the things that it does well is the application of semantic HTML. However, for the purposes of this post, what we’re really interested in is their branding – specifically the small comment present in their own website’s footer, “CODE IS POETRY”.

Wordpress - Code is Poetry

Figure 1: WordPress, 'CODE IS POETRY'

This idea is a key concept, which really crystallises for me an understanding of what semantic HTML is, and why it’s so important.

HTML is a language, in exactly the same way that English and French are languages. It has words, grammar, punctuation and an extensive set of rules about how different elements can be combined and when it’s appropriate to use them. Now, using English as an example, there are a feasibly unlimited number of ways in which I might convey an arbitrary piece of information to a reader or listener. I can use as many or few, as simple or complicated words as I choose, and delivery that message in any format or structure – there’s no intrinsically ‘wrong’ way of communicating.

At the other end of the scale, what happens if I take the same piece of communication, but consider how it’s delivered at a granular level? If I assess each individual word, the flow of syllables, introduce similes and metaphors, and ultimately craft the delivery to be as precise, elegant and emotive as possible, we’ve turned the same piece of communication from a raw delivery mechanic into poetry. This crafting of the individual components of the message allows us to add extra depth, meaning and context to the message, without bloating or fundamentally changing the original communication. We can understand bigger concepts, relationships and meaning – and we can (and should) do this with HTML, too, in pretty much the same way.

This is the essence and purpose of semantic HTML – the consideration of meaning, and the addition of depth and context to what would otherwise simply be ‘code’. There are real commercial benefits to thinking like this, which we’ll explore later; but first, a look at some practical applications.

There are different HTML tags for different purposes

At the most simple level, the use of different HTML tags forms the grammar of HTML. Whilst common knowledge dictates that, for example, it’s better to use <li> tags for list elements than, say, <div> tags, there’s much more to it than this.

For example, in a written document, I might italicise the term HTML the first time it’s encountered and defined within the content of the text by using an <em> tag; this echoes the approach that you might take in a written document if you’re trying to add context and definition. In HTML, we know that the word HTML is an acronym (a type of structured abbreviation, where each letter represents a word), so we should use the little-known <acronym> tag rather than an <em> tag. We can take this a step further and add an extra level of depth, producing something like <acronym title=“Hypertext markup language”>HTML</acronym>*. Search engines, users and devices can derive meaning from the text which isn’t necessary ‘on the surface’. What’s fascinating here is that you can’t add that level of depth to a written document or verbal speech – what we have here is a unique ability to encode communication with levels of depth that can’t be achieved in other mediums.

*Not the best example, as the first time a term is encountered with a document, it should use the <dfn> (‘definition’) tag; this one’s a grey area! Something like <dfn><acronym title=“Hypertext markup language”>HTML</acronym></dfn> might arguably be an appropriate way to represent the content. I’m not sure that there’s a ‘best’ answer in this scenario!

Consider the difference between <div>John said, “hello Jane”.</div>, and <p>John said, <q cite=”http://example.com/chatlog/123″>hello Jane</q>.</p> – the possibilities for SEO are astounding; if people do this at scale, Google can understand the nature of and relationships between elements and become vastly more capable of intelligent attribution and value modelling.

As a point of interest, it should be noted that the use of the word ‘document’ throughout this post isn’t accidental; there are  a whole selection of HTML tags that rarely see the light of day, but which were designed to allow for this kind of encoding in the early days of the Internet, when most web content resembled written documents. Definition lists, document revision status markup, table captions and citations are just a few examples of tags which have been around for years. So, if these tags are valuable and are far from new, why isn’t all of this common knowledge and standard practice?

Code vs. presentation

In the early days of the Internet, most web pages were generally vertically flowing text, with some basic media and markup – much like a printed, written document. Early versions of HTML were developed to fit this medium, and it was never anticipated how quickly, or to what extent the web would become a largely visual medium. In fact, the early Internet was explicitly designed as a ‘vertical’ medium, making ‘modern’ website design incredibly challenging. Challenges such as floating and nested elements, positioning and layout simply weren’t considered at the point of conception.

This, coupled with the fact that early browsers had (often very different) defined interpretations of  how different HTML tags should render (e.g., where <blockquote> tags indent content, <h1> tags make text big, etc.), and gave little control over presentation (e.g., via CSS). You simply couldn’t create a highly visual, but also semantically rich website; it was one or the other, or exponentially more design and development work, often resulting in a ‘fudged’ presentation.

Essentially, in order to have a ‘nice’ website, it was necessary to bastardise the tags at hand to achieve the desired effect, rather than necessarily using the ‘right’ tags. HTML <table> tags came to dominate the web, as the provided the most flexible layout and presentation options, but at the cost of being about as non-semantic as it’s possible to be; tables should only be used to present tabular data. The vast majority of older (and in particular, Geocities-built) websites utilised table tags to achieve almost all of their layout and positioning requirements – regardless of the fact that the content.

XKCD Referencing Geocities

Figure 2: XKCD referencing Geocities – a wasteland of table-based code

Things got better!

Thankfully, the last decade has seen a significant evolution in the field of website design, code architecture and semantic markup. Much of this has led from an increasingly commercial and competitive browser landscape, where the likes of Firefox, Google Chrome and Safari continue to push the boundaries in terms of separating code from presentation, CSS support and javascript processing speeds and capabilities in order to maintain a competitive edge and to gain a loyal and majority share. This has enabled (and made it easier for) website designers and developers to use the right tool for the right job in most cases.

Unfortunately, it was, arguably, too little, too late – whilst most modern website developers have stopped using tables and embraced CSS, the culture of enriching code and content with semantic depth never took off; we have all the tools at hand, but the history of our development decisions, processes and culture has meant that we’ve missed the boat. The fact that there are any ‘obscure’ HTML tags at all (in a language with only a tiny number of ‘words’) is indicative of how foreign the concept of semantic markup is.

The Evolution of web browsers by market share

Figure 3: The evolution of web browsers by market share

 

These ‘rare’ HTML tags are still applicable, semantic and more usable than ever (with pretty much universal browser support from IE7 and equivalent onwards) – but it’s a tough argument to suggest that vastly more time should be invested in ‘crafting’ code output and content markup vs. simply ‘getting it done’.

So what happens now?

There’s no reason why we can’t start using this now, and there really is value in doing so. At the most basic and practical of levels, marking up your content more appropriately ensures that Google can better understand it, which may gain you higher visibility in areas you might otherwise struggle to compete in, and allows them to better understand your authority and content. As a real-world example, a client I work with recently improved a glossary section on their website by changing it from standard ‘list’ markup (<ul> and <li>) to a definition list – and over the following months when their Google Analytics account showed that they were receiving thousands of visits a competitive head term that they’ve never actively perused, it became apparent that this traffic was originating for search results where Google provided term definition information before the main organic results, and that the content in question was being sourced from their list, and linked back to it. This is arguably by no means typical or traditional SEO (nobody acquired any links), but it bagged them a decent volume of traffic simply by adding a layer of context and depth to the content they already had – and with no more effort than tweaking the existing HTML and CSS.

An Example of Google's 'definition' search results

Figure 4: An example of Google's ‘definition’ search results

 

Generally speaking, this kind of markup doesn’t require massive change or extra code; you just refine what you’ve already got (though with some CSS considerations). All sorts of terms generate results like the one above; and the referenced sites have an opportunity to grab a reasonable volume of long-tail content in a space they might otherwise not be able to (or not warrant it worthwhile to) compete within.

The Semantic Web: The Challenge

So how do you use this tactically? It’s one thing to add layers of meaning to what you’ve already got, but that limits you to only making minor tweaks to existing content and code. To win big, you need to be able to take these tools and to craft something designed to utilise them – only a relatively small proportion of websites use this kind of approach, so to do something big can give you an incredibly advantage. Let’s explore an example.

Building awesome functionality

As SEOs one of our key challenges is in differentiating businesses (or websites) from their competition, and in demonstrating explicit, earned authority. One of the most powerful ways to do this is to ensure that a website offers functionality and facilities unrivalled by the competition – yet this kind of functionality is incredibly rare. We’ve no shortage of crazy, exciting campaign ideas, and ideas for how complex functionality could revolutionise campaigns, websites, and businesses; but seemingly we’re continually thwarted in our attempts to see any of it become reality.

The problem here sometimes isn’t the recommendation, SEO practitioner or the business/website – it’s the Internet itself.

Imagine trying to build a complex booking calendar system for a hotel website, to port that functionality into widgets, to allow the data to be exported and synchronised with Google Calendar or Microsoft Outlook, to allow inter-operability with other major, local and small business websites and their booking systems, and to be compatible with all modern and future devices. How about getting it to crawl the web, and find websites which reference events happening on the previous and subsequent days, and collect data about those to form a holiday package? That’s the kind of Linkbait that could make the difference between being just another website, and being the award-winning website that dominates and revolutionises an industry – but that’s hard to do. That kind of functionality is resource intensive, complex, slow an ultimately impractical to produce. In order to be that interoperable, that universal and still remain usable, the system would need to utilise some kind of universal and universally recognised language in order to communicate across each element – both where it’s displaying its own information and incorporating it from third parties – otherwise the development costs or liable to be unfeasibly and terrifyingly high (not to mention the 24/7 development upkeep, tweaks and data cleaning). The only way to make such a system feasible would be if we had some kind of universal language which would allow all of these different pieces of functionality and requirements play together nicely, so that it didn’t matter where the content came from or was displayed, only that it was consistently structured.

Microformats

In fact, this is one of the key limitations of the modern Internet – that it’s generally designed exclusively for human consumption, but with no consideration for interoperability, future-proofing. This makes sense, when you consider the commercial implications of having to build a human version and a ‘computer’ (be it a mobile phone, desktop software or Google crawler) – it’s just not practical. That’s why there’s so little ‘cool’ functionality of this kind on the web; there are no stand-out solution providers that unify, consolidate or reach out to each other, other than within their own corporate ecosystems. Mashups, infographics, data and research all rely on public (limited) or proprietary (expensive) data sources – or manual labour to go out and to research and find information on the web. Imagine if search engines and software could simply join up the dots themselves; a Google search (or crawl by your own software) for, e.g., ‘what’s the relationship between the birth rate in New York and the number of pizza delivery over the last decade’ should be able to return everything you need to be able to create an incredible (and linkworthy) piece of content because the data is already out there. The only reason we can’t do that at the moment is that the data is currently a mess; it’s in an infinite number of formats, and there’s no system which can understand all of it in a way which makes it compatible and future-proof.

Except that’s not the case – we do have a universal language, which all of this content is already available in. There’s a quote from Dan Connolly of W3 2000, where he says, “We all know that we have to produce a human-readable version of the thing… why not use that as the primary source?”

The universal language is the human-readable, front-end that we’re already producing and consuming as people (often in English, but where not, easily translatable). In the New York example above, if public record databases and pizza delivery websites simply added an extra layer of semantic context to their already existing content (for the most part, without having to undertake massive database and technical overhauls, and without creating the overhead of managing multiple systems and platforms), then we become capable of understanding, querying and playing with that information instantly – and all present and future devices, systems and programming languages can handle this in a consistent and identical manner.

In 2005, Microformats.org launched with an aim of providing universal approaches to marking up the HTML of common content types, such as people (in terms of names and relationships), addresses, events, reviews and an increasing array of content types. The usage of this kind of markup provides us with the ability to have and use one system which devices, humans and search engines can consume and understand in exactly the same way.

This is really at the heart of the concept of ‘Web 3.0’ – it’s about adding layers of context and relationship so that everything can communicate and become interoperable with everything else. Until recently, this has been something of a pipe-dream – but now we’ve many of the tools we’ll need right at hand.

Addresses

Addresses are the best example of the problem that microformatting solves, and what semantically structured (and microformatted) content allows us to achieve. Consider the following address:

Buckingham Palace, London SW1A 1AA 020 7766 7300

As experienced humans, we have sufficient familiarity with the way in which addresses work in order to accurately decode and interpret this address. However, in the code that this content was extracted from, the address simply reads as a ‘flat’ line, with no distinction between the house name, the city, the postcode or phone number. In fact, the actual code in question is:

<div>Buckingham Palace, London SW1A 1AA 020 7766 7300</div>

Not overly semantic, right?

So, even if we use semantic tags to mark this up as an address, say, as <address> Buckingham Palace, London SW1A 1AA 020 7766 7300</address>, it’s still going to be difficult for Google and other entities to interpret this data consistently correctly. Addresses in particular tend to be produced in HTML through an infinite number of approaches with an infinite combination of tags – your search engine has to be damn clever in order to extract, interpret and understand each component and to understand and manage it correctly each time. Is ‘020’ part of the post-code? Is ‘SW1A’ the city? Google will get this right a reasonable amount of time, but there will always be fringe cases where it makes the wrong guess. This address is a particularly good example of the value of adding this context, as is it doesn’t follow a conventional structure (there’s no street name).

As a variation of our hotel example, envision (as an abstract example) a web crawler designed to automatically build a web directory of all businesses whose street name begins with the same letter.  If 50% of businesses include the business name as the first line of their address, and the other 50% begin with the street address, how accurate will this directory be? Not very; unless those addresses are coded in such a way so as to allow the crawler to understand which bit is which, we’ve not achieved much other than adding yet another naff directory to the web.

Consider: If Google could get this right 100% of the time, what could it do? If every address on the Internet was implemented with exactly the same code and methodology, could Google simply build its Google Maps and local business listings through extrapolating addresses from pages, and relevance/value through the link and social graphs? For complex data, search engines and crawlers still rely very heavily on human validation, because the way in which the human-readable data is produced varies heavily behind the scenes. The solution to making this feasible and the norm isn’t to change the way we produce data, or to build more layers of functionality – rather, we simply need to explicitly mark up the stuff we’re already making for people.

The Microformats specification, then, was set up to deal with exactly this kind of challenge. By taking our original address and adding some harmless (though admittedly non-semantic) wrapper <span> tags (though you can just use what’s already in your code – it’s the class attributes that do the job, rather than the tags themselves), we can transform our messy address into something which can be explicitly understood -something like this:

<address id=”hcard-buckingham-palace”>

 <span>Buckingham Palace</span>, 

<span>

  <span>London</span>

  <span>SW1A 1AA</span>

 </span>

 <span>020 7766 7300</div>

</address>

This markup explicitly identifies each distinct component of the address in a way which any compliant device can understand, crawl, consume and utilise – without developing extra systems. We’ve started with the content that was designed and produced for humans, and added an invisible layer of context to it.

If everybody used this, it’d open up possibilities for organic systems, mash-ups and business models that we’re unable to even speculate about – but nobody’s on the bandwagon.

There’s been no impetus to utilise any of these tools, as there’s been no systems that utilise it or routes to commercial gain for putting the hard work into implementing all these extra layers of code and markup. At least, this catch 22 situation was the case until Google introduced Rich Snippets.

Rich Snippets

Very simply, rich snippets is a term Google invented to encompass its ability to extract semantic data from web pages and to expose that structured content as distinct components in search results.

Essentially, overnight semantic markup (and in particular microformats) went from being a pipe-dream to a commercial reality. The image below demonstrates just how seriously Google are taking the marking up and extraction of rich content on web pages in a search result for ‘Thai mango salad’. Revisiting our earlier example, where addresses are difficult to interpret without an understanding of the specific components, consider how much more complex recipes are – from cooking times, to combinations of ingredients, quantities and temperatures, through to calorie counts and reviews – this is just one example of where adding semantic and formatted data to your pages can add real commercial value.

Google search results for 'Thai mango salad'

Figure 5: Google search results for 'Thai mango salad'

 

There are reports of clickthrough rates increasing by up to 30% (though I suspect that this is symptomatic of early adoption, where a single, more visual result sticks out more when it’s the exception to the rule); but regardless of the actual numbers, of the results shown above, I imagine that the result without the rich snippets is losing out to those with the more structured results.

There’s a subtle shift here, where we’ve moved from an idealistic viewpoint, where the addition of extra context to existing content is an enabler of greater things to come, to a point where the creation (and markup) of the types of structured content which are supported by rich snippets is a commercial imperative, and a race to stay ahead of the competition.

Google’s Rich Snippets Testing Tool provides some insight into the nuts and bolts (as well as housing links off to associated documentation), but makes it clear just how seriously Google are taking and pushing the usage of semantic markup.

In the example below, you can see references to, e.g., ‘hcard’ – this is the standardised (microformat) markup approach for ‘people’ (and in this case authors). It works in exactly the same way as addresses, marking up existing content elements with an extra layer of explicit meaning, and it makes the person in question eligible to show up in search results in the similar way to our Thai mango salad. The list of ‘examples’ in towards the top, which includes ‘events’, ‘products’ and ‘reviews’ to name just a few is a sample of the currently supported content types which you can mark up to be eligible for this kind of SERP real-estate. It might be worth thinking if your financial services website or clients should start offering cookery tips.

Google' Rich Snippet Testing Tool

Figure 6: Google's Rich Snippets Testing Tool

There are obviously some overlaps here with Google’s push for validated authorship (in particular through the ‘rel=”author”’ markup and validation via Google+) – the mechanics are essentially the same principle, where we’re adding a layer of relational context to the existing data, content and/or links.

OGP Crossover

If you’re familiar with the Open Graph Protocol, some of this may sound familiar. In fact, OGP aims to achieve a similar but complimentary goal, but goes about things in a similar but slightly different manner. Where semantic markup aims to provide deeper context and meaning to content, OGP aims to markup the web at page-level (through the use of meta tags) to understand where web-pages are focusing on specific ‘things’; e.g., a web page about a film might be marked up using OGP to inform Facebook and other OGP users that the page in question specifically represents that film (with all of its associated meta data), but semantic markup might still be used within the content of the page to, e.g., markup individual show times and venues for the film. The two markup approaches can work together (and in many cases, the code/work combined)* in order to represent real-world items, places, people are ‘things’ in general in way which devices can understand (and understand consistently), and subsequently use.

*There are points where this falls down – OGP’s limitation to page-level focus doesn’t work particularly well when a page has multiple topics or focuses.

The two approaches work wondrously together; an extension of our above example might result in a movie database where each film page is explicitly marked up with information about that film for OGP, but that semantic markup is used to understand that individual actors (who in turn have OGP-enabled pages about them) relate to and star in specific movies – a web of related information can begin to emerge, which search engines, users and devices could simply consume and understand in the same way that a human user might.

So we just need to use OGP and microformatting?

This all sounds great, but we’ve introduced a new problem – whilst both of these tools allow us to add a layer of meaning and context, they’re provider-specific; both approaches might simply be retired, might change and become redundant, or might be replaced by a different third party providing a more comprehensive solution. In fact, what we really need is a single, universal and future-proof language and solution, otherwise we’ll never be in a position where it’s worth investing in the long-term view of joined up, ‘web 3.0’ content and data – the scale required to get this right, and get this right everywhere and for the foreseeable future can’t rely on a proprietary or closed system.

Schema

So, this is what it all comes down to. In July 2011 Google, Yahoo and Bing made a joint announcement heralding the launch of Schema – a partnership ‘in the same spirit as Sitemaps.org’, designed to solve exactly this problem.

Schema is, in many ways, an evolution of what Microformats.org started, but takes it infinitely further. It’s a flexible, hierarchical and open-ended model which allows for the classification of any ‘thing’* through a defined-but-open approach to semantic markup and tagging – from businesses and people, to places, parks, ponds, products and works of poetry**.

*It’s worth noting that Schema struggles with markup for abstract concepts (as does OGP) – the focus is definitely on quantifiable, palpable objects (specifically ‘things’ by their own terminology), where ‘thing’ always forms the top element of the hierarchy of information. This may change in time, but there’s no solution in sight at present.

**And sculptures, sports events, ice cream shops, men’s clothing storesthe list goes on and on…

Schema represents the ‘independent’, universal and well-considered approach to content definition that we’ve been craving. If your web page is talking about any type of thing, there’s either a defined markup approach available which will allow you to enhance the code you already have (again, through the addition of semantic and structured class attributes), or you can extend an existing hierarchy to meet your needs. As an example, a Schema structure might exist for ‘school’, but not ‘secondary school’ – it’s a relatively easy to step to find a starting point and to build an extra level of depth on to it in a way which search engines and other systems can understand. Popular schema additions are liable to become eligible for incorporation into the master list (likely subject to peer review in a similar manner to the W3 working group processes), and new schema elements will be added in line with the extension of search engine rich snippet (or equivalent) offerings.

This is already out there in the wild – in fact, a number of Google’s rich snippets (including, from the example above, mobile software application and TV series) are dependent upon schema markup, and are gradually replacing microformats as the standard approach to rich data.

This is enormously powerful, and where historically a catch-22 scenario existed around justifying the implementation of microformatted content, the collaboration of the major search engines and continued evolution of search results makes this a very, very commercial race; the early adopters of each new rich snippet format are positioned to make an absolute killing, and those who are slow to adopt are liable to get left behind (like the poor Thai mango salad result without the photo and calorie count…).

The search engines are actively pushing the addition of this kind of markup to content, which will proliferate the quantity and quality of formatted data out there – all sorts of crawlers, new-age ‘intelligent’ directories, mashups, infographic generators and who-knows-what will follow in the wake.

It should also be noted that Schema does a reasonably good in some ways of bridging ‘page-level’ and ‘element-level’ markup – allowing for nested hierarchies of information and groups of subjects; you just need to add the relevant markup to the sections of code or content in question.

Conjecture

Search engines stopped believing what we (the collective we) told them that our web pages were about a decade ago, when meta tags and keyword stuffing were used to trick them into believing that a piece of content was more relevant than it might otherwise have been. In fact, we’ve taken a considerable journey away from describing our own content, to the search engines deciding thayt we can’t be trusted, and that they’re better off working it out and determining it for us. That makes sense; however the promotion of Schema (and OGP, etc) comes at a cost, where we’re responsible for self-classification of our content again… On one hand, I suspect that this implies that Google are much more clever at understanding things like author rank and network authority than we might otherwise believe (so as to be able to validate self-elected status and context), or the benefits they’ll gain overall from enriching search results (visually and functionally) through the collection of this data potentially and generally outweighs the drawbacks of it being less (or, rather, artificially more) relevant in some cases. Are a bunch of nicely formatted and presented – but albeit fake – reviews in the results better than a standard, vanilla listing (and it does seem that the system is very easy to cheat)?

In fact, there are some interesting nuggets hidden away in the Schema documentation that would suggest that the latter may be the case, and that they’re not that savvy after all – for example, the guidelines state strongly that content that is invisible to humans (e.g., rendered invisible through CSS via display:none or similar) won’t be eligible for inclusion or usage; however, they provide the ability to utilise abstract <meta> tags to mark up information that isn’t represented on the page – for example, a country-level element in an address on a website targeting local consumers might be omitted, but is still required as part of the address. You might produce something like:

So, you can’t hide elements but you can have hidden elements, and you can self classify your information, leaving it open to a world of abuse; it feels a little shakey. Perhaps they’ll crack down a little later down the road through invoking advertising standards agencies and equivalents, where the production and promotion of fake reviews is for all intents and purposes a criminal act; something which hasn’t necessarily been quite as ‘nailed down’ in the law as it may have needed to have been historically – and promoting fake reviews (or fake review scores) is a much more serious offence than the addition of pharmaceutical or gaming keywords to a meta keywords tag a decade ago might theoretically have been. Perhaps they have enough time and resource at the Googleplex to have teams manually reviewing and auditing the quality of rich snippet content – or perhaps it’s all just a big experiment. I suspect not, however, given the degree to which this has been promoted, integrated and continued to expand.

The moral of the story?

None of this requires a huge amount of effort to execute, but adds a tremendous amount of future-proofed value, and can yield overnight commercial success. So, without further ado:

  1. Take your existing content, and use semantic HTML to mark it up to the point where every single element is as enriched, relevant and correct as it can be – you’ll find that this should have remarkably little impact on your development resources (other than perhaps some slight CSS tweaks), as you’re just adding extra depth to what’s already there.
  2. Once that’s done, identify all of the currently supported rich snippet formats; wherever your offerings and/or content align (products and/or reviews can be almost universal, if interpreted liberally – e.g., reviews of your business derived from testimonials), and mark those up too.
  3. Where your web content doesn’t quite align with existing Schema definitions, consider creating your own expansions (and blog about it – get some links).
  4. Examine the Schema list for content types that you cover, where Google might feasibly produce a new rich snippet format. The obvious ones (people, products, etc.) are covered, but what’s next? Can you get in there first by ensuring that your content is ready?
  5. Finally, examine your content for gaps where you could feasibly expand to target other rich snippet formats, or consider branching out explicitly to align with those content types.
  6. Start thinking about HTML  if you haven’t already – it’s essentially a reinvention of the language to allow it to be more flexible, semantic and extensible. It makes the addition and nesting of Schema attributes and data much more intuitive than with ‘standard’ HTML 4, and may not be as big a job as you expect.

Questions?

www.jonoalderson.com

@jonoalderson

This entry was posted in Guest Blog.
Leave Comment

6 Comments

  1. Pingback: searchABLE Presentation – What’s the big deal about semantic HTML? | Jono Alderson

  2. Thanks to Edward Lewis for pointing out that the tag is being depreciated and essentially merged into in HTML 5; not sure if this is a positive move, or a reaction to people not really understanding the difference between acronyms and abbreviations… Ah well, something is better than nothing I suppose – as long as it gets used!

  3. Martin Woods says:

    Which tag are you referring to Jono?

  4. Ah, the comment stripped the HTML, gah!
    ACRONYM is being removed / merged into ABBR (they’re pretty close in terms of use cases anyhow, but I’ve always felt that the distinction was useful).

  5. Pingback: Google Project Glass: can you really see people wearing digital goggles?

  6. Pingback: ionSearch – A sleepy synopsis | |

Leave a comment

*
*