Reconfiguring the World Wide Web into a giant relational database that can learn.
It’s been almost 20 years since Tim Berners-Lee first created the hypertext markup language (HTML) and the hypertext transfer protocol (HTTP) between clients and servers that have become the mainstay structural model of the World Wide Web (WWW) we all use today. Web data – text, graphics, video, audio – are marked via simple hypertext links, and not much more. As such, the power of the web is limited structurally to a static system, where data types are often blind, and more information about them is not exploited in a dynamic link variable. Data context, or a semantic interpretation of the data, is also not exploited. In fact, dynamically linked variables often found in object-oriented programming languages remains elusive to what the web could revolve into – a giant relational database, where the power of data relationships can be leveraged to provide web users with a significant advantage in finding and using just the data they’ve been looking for, and not a mountain of data that is irrelevant or that they don’t need.
The person leading this new charge for a massive change in the web as we know it? Tim Berners-Lee. And yes, there are others, notably those associated with the World Wide Web Consortium (W3C)
What are the features of DLWD that would make the Semantic Web (SW) a powerful experience for web users? How would they practically be implemented? We explore those nontrivial issues next.
Dynamically Linked Web Data (DLWD) and Implications
The power of DLWD is best understood through an example of how a web search for information could yield vastly more meaningful and targeted data or query returns.
In a typical web search based on keywords, we get back all HTML links with text or data files that have some or all of the keywords we specified. The mountain of query returns is many times not productive and can rank in the tens of thousands. However, if the web data (text, graphics, video, audio) we were searching each had an associated link variable, and that variable was populated with information that is also searchable, we’d have a better chance of getting back the data we need. Furthermore, the link variables themselves can be dynamically linked to each other, much like pointer variables in object-oriented programming, rendering a correlation factor among search data, and thereby potentially vastly increasing the value of search query returns.
As an example, let’s say you’re looking for a collection of surveys and reviews for a prescription drug on the web, but you want the query results to be targeted to include only surveys that women of a certain age range have responded to. If you enter keywords “<drug name> women age 40-45” into a typical search engine you might get back 1000 or more returns. On the first page alone you might see links to information that have nothing to do with the drug you listed – the link and associated text just happened to contain the keywords “women age 40-45.” Many of the returns may not be an accurate translation of what you meant by “40-45.” Some at the top of the list may include 40-45 to mean several completely different things, like “40-45”% of something, or pages “40-45” of some periodical. The primary issue is that the context or semantics of our query is not used in a meaningful way because the data links we search have no coded context associated with them and the reader of those data links (the search engine) wouldn’t be able to translate it if they did exist. Enter DLWD and the Semantic Web. In this realm, websites are programmed to tag information entered by users with links that are variables, that in turn are populated with useful, searchable information and may even point to other links (DLWD). How this is done is not trivial – many times the survey data we seek may be from users who wish to remain anonymous and just provide casual feedback on their experience on any number of informal forums, like a chat room. The information entered must be encoded to be machine-processable, into a DLWD variable. The feedback might then be combined with other feedback from an entirely different website in an aggregated form by connecting data links having meaningful relationships (fields of the DLWD variables). A search tool that is designed to read and interpret DLWD would be able to limit the query returns to those with relationships most closely matching what we specify in keyword and context. In our example, if there were age and gender variables associated with the searchable data that also contained the drug keyword then we would be sent back all the DLWD links that contain the gender field “women” and any age field with numbers in the range 40-45. We might still get back spurious results if the search results contained only the numbers 40 and 45, or other fields with a number range “40-45,” so context is still an issue. To solve that problem, there must be a way for the search tool to interpret what we meant by “age 40-45,” that we want ages 40,41,42,43,44, and 45. A human knows what the number range means but a machine might not unless it was told to translate 40-45 to mean 40,41,42,43,44,45. One way to ensure this happens is if there exists a pointer to a translation document that defines 40-45 to mean 40,41,42,43,44,45. This pointer could be part of the DLWD variable. In the lingo of the Semantic Web , this type of pointer document is referred to as an “ontology
The Semantic Web Implementation
In practice, the implementation of DLWD and the Semantic Web are quite nontrivial. After years of exponential growth of web data that is highly disorganized from a lack of inherent structure or logic, we face an uphill climb to reorganize the web into a system with structure and logic, and yes, maybe even the learning and comprehension ability found in artificial intelligence (AI) systems.
As a first approximation, Berners-Lee and the W3C have come up with a set of standards and specifications  for how data might be encoded, ontologies built and tools constructed.
For data encoding, a resource description framework (RDF)
subject URL:http://www.<drug name>.com
predicate URL:http://www.<drug reviews>.com/drugReviews
object URL:http://www.<health blog>.com/women/40-45/#1234
to represent “<drug name> reviewed by women of ages 40-45.” In graphical form, the URIs are nodes that have properties, and are linked to other nodes that have related properties and values. As another example, a website on the sun could contain temperature data that includes links to other websites with solar temperature data. Formal RDF would codify this into a knowledge representation, as I point out, data links are dynamic.
For ontology construction, a web ontology language (OWL)eXtensible Markup Language (XML)
Tools have been and continue to be developed to take advantage of the W3C Semantic Web paradigm. One of the more useful tools is a query tool that is optimized for RDF and other semantic data encodings – SPARQL Protocol and RDF Query Language (SPARQL)
PREFIX abc: <http://example.com/exampleOntology#>
SELECT <drug name> women 40-45
abc:<drug name> ?drug
abc:women ?gender ;
abc:female ?gender ;
abc:40-45 ?age ;
abc:ageRange abc:40,41,42,43,44,45 .
The expressions “?drug, ?gender, ?age” are class variables and the expressions “drugReviews, demographics, ageRange” are properties. This query specifies exactly what we want and how each of the keyword terms is related semantically. Of course the problem is that the web data we search may not be marked with RDF syntax with inter-related links. Unless that standard is adopted and followed, SPARQL queries would be less than useful. The point is that by setting up the standards for RDF, OWL, SPARQL, etc. web developers and users are encouraged to follow the standards to enable the Semantic Web and its powerful properties.
So then – how are web users and developers encouraged to adopt and follow these standards? Providing RDF information for data links and linking open data sets (i.e., including RDF statements that link to other URIs so that they can discover related properties) is an ongoing mission for the W3C Semantic Web Education and Outreach (SWEO) Community ProjectDisco Tabulator
Another important project is “DBpediaOpen Data here CIA World Factbook FOAF Project Gutenberg here
Future Concepts Based on DLWD
With all the organic development going on to implement the SW it is not too early to ask where it could evolve, or revolve (as in revolution), to. RDF is a convenient way to express web data in a semantic or knowledge representation using URIs that are interlinked. But to really grow toward a system that has AI properties, or a seamless intelligent learning system for the casual user, the SW must become more dynamic. One built-in way this would happen is through ontology mapping dynamic themselves, or changeable, based on any number of forces: learning models, revised or improved information, new information, environmental effects, etc. Hence the term “DLWD.” How one goes from the SW and RDF in its semi-static form to SW and DLWD is, I think, a grand challenge. It will involve thinking about how to make links truly dynamic but not lead to information loss (a real downside threat). The motivation for this is to more closely match a system like that of the human brain, which is a neural network of synaptic circuits that represent a collective memory that can learn and think (that’s putting it simply!). Synapses have strengths that are dynamic and the brain is a highly interconnected system (ultra high integration density) of these local memory circuits. A single neuron cell can contain several thousand synapses. Though there are cells, layers and regions, the brain has built-in redundancy, another feature to consider for the SW future, and is one prevention of information loss. Cloud computing
The immediate focus of the SW is to (or ought to be to) provide seamless tools for the casual user (read: the consumer), so that he/she may be able to easily extract intelligent, relevant information and even add to the learning cycle. But for those of us who dream, we’d like to eventually see a web that can pass a Turing Test
References and Endnotes:
 See W3C Semantic Web Activity