There’s a groundswell of opinion suggesting that it’s no longer content, but rather context that’s king in publishing.
And you can see why: So much information, so little judgement about what’s accurate, appropriate or apposite.
And web search engines and publishers have become some of the worst culprits: Web ads served to accompany news stories aren’t clever enough to distinguish between timely, tart and tasteless. A couple of recent examples have been widely quoted – the weight-loss ad which appeared alongside a court story about a child starved to death, and another offering liaisons with ‘sexy Iranians’ to those reading about the street death of a woman in Teheran.
We’ve got past the trite complaint that “the computer’s to blame” – it never is, since it is only doing what it has been told – but the sloppy systems used to pair ads with stories don’t yet know better.
That will change, but it will take time, according to Jean-Michel Texier, chief technical officer and ‘chief visionary’ of text mining specialist Nstein. He came together with text analytics expert Seth Grimes in a webinar presented by the Canadian company.
The concept of the ‘semantic web’ has been brought forward with Google’s announcements about enhanced search results and Microsoft’s Bing.com search engine, may mean that the era is upon us.
And the enabling technology is text mining, which Grimes says, automates what researchers, writers and scholars have been doing for years, with research organisations – such as those of pharmaceutical companies – among major early adopters.
Grimes explains that text mining applies linguistic and/or statistical techniques to identify, tag and extract concepts and patterns that can be applied to categorise and classify documents, audio, video and images. It transforms ‘unstructured’ information into data to which traditional analysis techniques can be applied.
“Text mining unlocks meaning and relationships in large volumes of information that were previously unprocessable by computer,” he says.
Grimes points to the inadequacies of current search engines, but admits it’s easy to make them appear dumber than they are: ‘Google’ for a ‘good hotel’ and you’re likely to draw the hotel’s opinion of itself and those of vested interests... plus a whole lot more irrelevant data. And in sentiment analysis, words out of context can be misused or misunderstood.
Grimes says that while Web 2.0 – with its blog aggregation and emphasis on participation – is dynamic, personalised, interactive and collaborative, Web 3.0 will add a ‘semantic web’ in which content is semantically enriched and Mde context (as well as location) sensitive.
Turning the Web 3.0 vision into reality may take a while – perhaps four or five years – but Grimes points to websites such as the Financial Times’ newssift.com which are already working to improve contextual searching. Despite the denials, it may be no surprise that the FT site is an Nstein client ... but no matter, it’s a good example.
Text mining and analytics enables Web 3.0 and the semantic web, Grimes says, categorising and classifying content automatically, augmenting text with tags and metadata, and extracting information for use in databases.
Technical concepts have to address linked data and microformats such as the resource description framework specifications (RDF), SPARQL query language and web ontology language OWL.
A new study from Grimes Alta Plana IT strategy and implementation company provides user perspectives on text analytics solutions and providers. There’s an estimated a US$350 million market which has increased in value by 40 per cent in two years.
The big applications are seen as managing brands, checking on customer experience, and finding about what your competitors are doing, Grimes says. Content management and publishing are as yet, small beer ... but ahead of other growth industries such as law enforcement and compliance.
Predictably, perhaps, he found that it’s blogs and other social networks (62 per cent) that text miners most want to target, ahead of news articles (55 per cent) and online forums (41 per cent). Email and correspondence (38 per cent) and customer/market surveys (35 per cent) also rate. Named entities – people, companies and geographic locations – are the ‘most needed’ data.
Grimes says there are signs that publishers are reaching their audiences: “I’m really encouraged by the prospect of creating real value,” he says.
No longer the ‘snake oil’, the semantic web still has few practitioners. Adopters have typically been developers, but an opportunity exists for publishers, “maybe next year, but probably not ... but soon after that,” he says
An irony is that academics who use less mining-friendly formats such as PDF instead of XMP are delaying uptake of their own ideas.
Nstein’s core text mining products include modules which detect tone and sentiment, as well as extracting ‘most relevant’ sentences and finding similar and related content.
Texier – a Frenchman who founded web content and digital asset management specialist Eurocortex in 2000, selling it to Montreal-based Nstein six years later – says the technology will help publishers enrich, reuse and monetise content.
Editorial benefits include automating content classification (or making it easier) as well as allowing faceted searching and the discovery of content. Search engine optimisation is improved and user retention and ‘stickiness’ increased.
And Texier says contextual advertising and publishing is enabled ... a route to avoid the publication of inappropriate advertising alongside stories for which it would be inappropriate.
Strategically, the technology brings a range of benefits – providing information about behaviour (and competition), and aiding editorial and marketing decisions.
It’s the business of turning information into knowledge again, and enlivening readership by providing the right services. Says Texier, “’Aboutness’ will lead to monetisation.”