Monday, May 21, 2012

Google Knowledge Graph: a further step towards the Semantic Web?

...maybe yes!

According to Google, Knowledge Graph is the new frontier of Web Search. From the video below, it seems that Google was able to build a huge semantic network that will be exploited to retrieve semantically related content to a query.


However, it is not yet capable to fully understand natural language queries such as those showcased by PowerSet a few years ago:

1. Books on children

2. Books for children

The above queries differ only on the prepositions: "on" vs "for". Standard search engines get rid of these words in the indexing phases (they are "stopwords". Unless the content is indexed differently there is a minimal chance that the right results will be selected for the different queries. In other words, for Google the two queries are identical.

If you think that you may overcome this problem by putting the query into brackets: "books on children", Google will only return results that contains the string "books on children", which is not exactly what we are looking for.

Being a book ON children means that the book should tell stories about children. This is a PROPERTY of the BOOK object. More precisely is the value of the attribute TOPIC for the concept BOOK (if you speak RDF, it would be the triple topic(book, children)).

I don't know what exactly are the plans at Google, but if they really want to make progress towards the Semantic Web, they should turn their "classical" indexes into an RDF version of them where the text of the pages is semantically parsed and the semantic roles extracted. This is a very computationally expensive task (well, IBM Watson did it).

But it is not enough. Google should also process the query differently, i.e. without removing stopwords like prepositions as they carry essential semantic meaning as in the above queries. The technology for doing this already exists and it is also quite effective. I am sure Google is onto it.

Post a Comment