Tag-based Vs. Topic-based Systems
December 11, 2019

For any news publisher, you know that a final, tedious part of the process is adding tags to your story before sending it out to the world. Traditionally, tags and keywords in articles have been used to help readers dig deeper into related stories and topics, and give search audiences another way to discover stories. But are tags the best way to go about this?

In one of our previous articles, we mentioned that publishers should stop asking their editors to tag articles for many reasons, namely: lack of consistency, time consuming, and subjective. But most importantly, tags aren’t the solution to automating related articles.

Where Tags Fail

Tags are not unified: each person has the tendency to tag articles differently, no matter how many guidelines are set, therefore this process tends to be highly subjective. In addition, some may use synonymous tags which would lead to duplicate and unorganized tags. For example, one editor would mention “Donald Trump” whereas another editor would add a tag “President Trump”. Although both tags are about the same topic, sorting them out would prove to be more complicated.

No semantic understanding: two articles could be very closely related but have no words in common. Tags reflect what the content is about, not everything it mentions. It’s easy to scan through an article and pick out words that seem important (e.g. people, places, products, etc.) but there’s a difference between a review of a product, and an article that mentions that product. Therefore, the value of tags isn’t quite accurate.

Language barrier: if your website publishes articles in different languages, it would be almost impossible to link and relate tags cross-lingually.


Using artificial intelligence is a must. Automation tools, such as Magnet, can tag your articles seamlessly and consistently, all while saving your editors’ time and improving accuracy. In addition to automatic tagging, Magnet is used for understanding audiences, identifying subscription habits and patterns, and highlighting subscriber preferences.

Klangoo also developed an NLP (Natural Language Processing) – API to understand the “aboutness” of text and scores accordingly. Using a scoring system promotes accuracy, and breaks down any language barriers. Even if two articles have no words in common, artificial intelligence can understand that the latter are related.

For example, our NLP-API can identify named entities in a text, and score accordingly:

Examples of Entity subtype: People > Politician > President > Francois Hollande.
Each identified Entity is scored based on its representation in the content. The score ranges [0-1] and is also mapped to a qualitative score of: Very Related, Related, Somewhat Related, and Not Related or Mentioned.

In addition to named-entity recognition, our NLP-API engine can relate topics that use different words or even different languages and automatically identify categories.

Interested in learning more? Get in touch – info@klangoo.com or follow us on Twitter.

Klangoo NLP
Subscribe to our Investors' newsletter to download the Investor Relations Brief & receive exclusive content!