Guardian angel: Content and CMS behind the UK publisher

Dec 08, 2014 at 12:02 am by Staff


Think The Guardian and you might picture an elegant but slightly austere left-leaning broadsheet, now fashionably Berliner-sized, and of the weekly international edition printed in remote centres around the world including Sydney.

That perception, however, is about to change when one of the world's most popular online news destinations launches an Australian edition, expected to move quickly to a top three position in the country.

How's that done? In a couple of words, with content and content management.

Recent weeks have seen the new venture - backed by publishing entrepreneur Graham Wood - recruit industry top names including Lenore Taylor - the Sydney Morning Herald's chief political correspondent - who will be the new site's political editor, with The Age national affairs correspondent Katharine Murphy as her deputy. Both are leaving Fairfax Media, seen in any case as the publisher with most to lose from Guardian Australia.

But while quality journalism is a key component in the mix, standout editorial technology also underpins the Guardian's success to date.

The publisher has been lauded for fighting back against competition from internet startups and social media with its use of open source tools such as the document-based Apache Solr/Lucene search architecture and its 'open' news content platform.

Its business model makes content available at different levels according to engagement. Apart from the industry-wide option of simple headline feeds, the Guardian allows partners to take an entire story - but with attached advertising, from which Guardian retains revenue - or on a fuller and potentially reformatted custom basis with a negotiated revenue model. A WordPress plugin allows bloggers to integrate content feeds directly into their own publications.

In the UK, the vast amount of data accumulated about Westminster politics and elections is made available through a specific 'politics API' for which no pagination is required.

The Guardian's search-based architecture, deployed via cloud and Amazon's EC2, provides full text search support, data filtering and replication... and massive scaleability.

The 'open platform' - which began replacing a variety of means for distribution of text, images and multimedia including RSS, batch sends and email as long ago as 2009 - has been embraced as part of the publisher's new culture... certainly more than mere technology.

And it's a two-way affair: Partners can actually 'inject' functionality and content into the Guardian website. A couple of years back, it introduced a MicroApp framework through which partners could integrate their own content, creating interactive experiences in pages and sidebars, and releasing technology built for its own use to a wider market.



Martin Belam joked at a conference a couple of years back that the Guardian started its CMS design process by "locking the software architects and the editors in a room, and not allowing them out until they had agreed on a definition of exactly what a piece of content consisted of and agreed a common vocabulary".

Well not quite, but he says a 'domain driven design' process solved a lot of problems. Tags are key - identify everything from type, origin and section to 'tone' and subject (from a list of 9000) - as is the system's API.

"Tags determine where an article belongs on the site, and allow it to belong in multiple places," he says. "A review of the film 'The Damned United' is also placed on the Leeds United page by simply tagging it 'Leeds United'. Nobody from the film desk has to talk to the sport desk about the article to get it placed on the relevant page."

The system allows a contributor's profile to be segmented by the topics they write about, with disputes about sense and relevance arbitrated by a 'tag manager'.

The Guardian's content API is designed to allow automated search using complex queries, and is used to drive the three-tier syndication process. It is powered by a huge Oracle database, with client libraries to make it accessible for a variety of programming languages.



Content and content management: It has local recruits, while Guardian deputy editor Katharine Viner has been named as launch editor. Outgoing ABC director of editorial policies Paul Chadwick is also a non-executive director.

Owned by a trust and itself a migrant from the English mill city where the Manchester Guardian was founded in 1821, the new arrival is a lot more than a mere 'ten-pound Pom'. Can it do better than publishers using more conventional, more proprietary systems? The focus is certainly on openness, both in terms of its software and of its accessibilty to content contributors and users.

The Guardian website starts with a claimed 1.3 million regular readers in Australia and New Zealand, so it's not a huge leap to imagine it being a serious challenger, among or atop the three market leaders. And then what? Guardian News and Media editor-in-chief Alan Rusbridger has been quoted saying the Australian operation would serve as a base for more in-depth coverage of Asia.

And the door is open...

Sections: AI & digital technology