Articles | Introduction to RDF
| back
Introduction
RDF - the Resource Description Framework - is a
foundation for processing metadata; it provides interoperability between
applications that exchange machine-understandable information on the Web. RDF
emphasizes facilities to enable automated processing of Web resources. RDF
metadata can be used in a variety of application areas; for example: in resource
discovery to provide better search engine capabilities; in cataloging
for describing the content and content relationships available at a particular
web site, page or digital library; by intelligent software agents to
facilitate knowledge sharing and exchange; in content rating; in
describing collections of pages that represent a single logical
"document"; for describing intellectual property rights of
Web pages, and in many others. RDF with digital signatures will be key
to building the "Web of Trust" for electronic commerce, collaboration,
and other applications.
Alongside with MathML,
RDF is one of the 1st applications of XML
to ship. These 2 are supposed to be XML test-cases.
Even though RDF is still only a working draft at the W3C,
more than 40 content providers have planned to deploy RDF on their internet
sites, including Netscape,
CNN, AltaVista
and Amazon.com.
Before we move onto the details, I need to tell you,
that a draft document, is a work in progress. The Working Groups working with
RDF at the W3C has not yet reached full consensus on all parts of RDF, and is
continuing to refine the draft.
A short history of RDF
Even though it might be a bit premature to tell the
story of a technology, that has barely started, I'll try to give it a shot.
No one individual or organization invented RDF; it is
very much a collaborative design effort. RDF started as an extension of the PICS
content description technology. It is now also drawing upon the XML design as
well as technology submissions, such as Microsoft's XML-Data
paper, SiteMap proposals, and the Dublin Core/Warwick Framework have also
influenced the RDF design. Later on I'll show you a Dublin Core example....
Development of PICS was motivated by the anticipation
of restrictions on the Internet such as some recent US legislation (the
Communications Decency Act and it subsequent overruling by the Federal Supreme
Court). More on PICS later.....
One of the earliest and very important metadata systems
on the Web is the Meta
Content Framework (MCF), a specification which was first introduced by Apple
Computer in September 1996 and is still in use by hundreds of websites today.
MCF was developed as a Navigator plug-in called "HotSauce,"
that was limited to providing site map applications, since it was not
extensible. Netscape was among the first industry partners to support MCF, and
initially they extended it to XML. Later Netscape submitted a formal proposal to
the W3C in June 1997, plus helped form a W3C working group on RDF in September
1997.
The working group has since then been working on
drawing from Netscape's MCF proposal, as well as the W3C Recommendation PICS, to
define a new framework for viewing, manipulating and associating networked
collections of information. Several existing W3C activities, including
submissions on managing personal user preferences through OPS (Open Profiling
Standard), defining push content channels using CDF (Channel Definitation
Format), as well as parental controls described by PICS, are among the various
more narrowly-focused applications now addressed by RDF.
What is metadata?
Let's start of by talking about Metadata. Metadata is
"data about data" or "information describing content." In
HTML we have:
<meta name="keywords" content="rdf,xml,w3c">
|
and also the:
<meta name="description" content="This page is about RDF">
|
These tags, tells you, that the keywords in this
HTML-file is rdf, xml and w3c. Keywords are often
used by search engines such as AltaVista,
Excite, and Lycos
to index sites.
The next line gives you an description of the site; This
page is about RDF. This is also used by search engines. When they display
search results they most often display the description line.
In the context of RDF, metadata is "data
describing web resources". RDF uses XML as the encoding syntax for the
metadata. The resources being described by RDF are, in general, anything that
can be named via a URI. The broad goal of RDF is to define a mechanism for
describing resources that makes no assumptions about a particular application
domain, nor defines the semantics of any application domain. The definition of
the mechanism should be domain neutral, yet the mechanism should be suitable for
describing information about any domain.
Why users want metadata
If you are having trouble with "information
overload" you should look into metadata, as this will give more control
over content. A big problem with HTML, is that there are too many different
interfaces to metadata information. On one page, an author might use the
following piece of code:
<meta name="Author" content="cybercoded.com">
|
and some other author, that wanted to display the same
information, could instead use:
<meta name="AuthorName" content="cybercoded.com">
|
This shows some of the current problems, that search
engines are facing. There's no current standard. What is really meant by Author?
or what is AuthorName? and what's the difference? And also, what David Cooley,
are we talking about? The David Cooley from Denmark, or some other David Cooley?
Why publishers want metadata
If you are a publisher (isn't everybody on the internet
a publisher), you would want to look into metadata, so that you could provide
more information about your content. Today there is no complete and
standard way to describe all aspects of website content.
There's currently also much redundancy, where description
of site content requires multiple standards and multiple files. The current
internet metadata is not widely used by publishers, which perhaps is caused by
the lack of extensibility.
The systems currently used are proprietary,
incompatible, and are not widely supported by software vendors.
Another point that makes metadata interesting for
publishers, is that RDF introduces a uniform query capability for resource
discovery. This could give a publisher much more information about their
competition, without drowning in the "information overload".
PICS and RDF
PICS is a mechanism for communicating ratings of web
pages from a server to clients; these ratings, or rating labels, contain
information about the content of web pages: for example, whether a particular
page contains a peer-reviewed research article, or was authored by an accredited
researcher, or contains sex, nudity, violence, foul language etc. Instead of
being a fixed set of criteria, PICS introduced a general mechanism for creating
rating systems. Different organizations could rate content based on their own
objectives and values, and users - for example, parents worried about their
children's web usage - could set their browser to filter out any web pages not
matching their own criteria.
One of the requirements for the RDF design is that it
be able to express everything that a PICS-1.1 (Platform for Internet Content
Selection) label can express, and that it be possible to automatically translate
PICS-1.1 labels into RDF format without loss of information. Any future
technical work on PICS will evolve it to using RDF. The W3C PICS Interest Group
is chartered to decide when this transition is appropriate. Software and Web
content using PICS-1.1 will remain a supported W3C Recommendation for as long as
the market demands.
It is expected, that PICS-1.1 and an equivalent
expression of PICS ratings in RDF will both be useful for quite some time.
SiteMaps with RDF
A sitemap, is only one of the many things RDF offers,
but it is very easy to implement.
As RDF is able to solve the complex problem of managing
information across multiple yet incompatible file formats, you'll be able to
automatically generate a sitemap in software that uses RDF. This could be a
sitemap of your desktop PC, that would give you an easy-to-use interface to
unify all of the information you need, regardless of whether it resides on the
Internet, a local network, in a legacy database, in an e-mail thread, or on your
hard drive. This could be the true Web-desktop integration.
The SiteMap could also be an automatically 24-hour
updated SiteMap of your site. RDF would then integrate the metadata into the
sitemap giving the navigational look and feel you need, plus all the information
you want. When you move your mouse over a link, you could have a pop-up window
with a description of the site, and the SiteMap could also provide search capabilities
and much more.
The SiteMap would be written using XML syntax in
standard ASCII text, either using an editor or special tools. It could either be
an applet, or it could be a combined html,gif,jpeg file. It would use the
"text/xml" MIME type, and would also have the capability to reference
additional site maps (it might have one high level file, then separate files for
each area of the site).
Model, Syntax and Schemes
Without trying to get too complicated, I'll spend this
chapter telling you about the different RDF components. I'll finish off with a
nice looking example, so please stay with me.
At the core of RDF we have the RDF Data Model for
representing named properties and their values. These properties serve both to
represent attributes of resources (and in this sense correspond to usual
attribute-value pairs) and to represent relationships between resources. The RDF
data model is a syntax-independent way of representing RDF expressions.
The RDF Syntax is for expressing and
transporting this metadata in a manner that maximizes the interoperability of
independently developed web servers and clients. The syntax uses the eXtensible
Markup Language (XML).
Last, but not least, RDF Schemas are a
collection of information about classes of RDF nodes, including properties and
relations. RDF schemas are specified using a declarative representation language
influenced by ideas from knowledge representation, e.g., semantic nets, frames,
and predicate logic, as well as database schema representation models such as
binary relational models, and graph data models.
RDF in itself does not contain any predefined
vocabularies for authoring metadata. It is though expected that standard
vocabularies will emerge, after all this is a core requirement for large-scale
interoperability. Anyone can design a new vocabulary, the only requirement for
using it is that a designating URI is included in the metadata instances using
this vocabulary.
Without going further into the core, or starting to
talk about Nodes, PropertyTypes, or Triples, I'll show you an
example instead, that illustrate all these things:

What this picture tells you, is that "David Cooley
is the Author of the document whose URL is http://www.cybercoded.com/some.doc".
In RDF syntax this would be:
<?xml:namespace name="http://docs.r.us.com/bibliography-info/" as="BIB"?>
<?xml:namespace name="http://www.w3.org/TR/WD-rdf-syntax#" as="RDF"?>
<RDF:RDF>
<RDF:Description RDF:HREF="http://www.cybercoded.com/some.doc">
<BIB:Author>John David Cooley</BIB:Author>
</RDF:Description>
</RDF:RDF>
|
The above syntax represents the named
properties, and their values (the Data Model), using the schemas
in the 2 first lines, that'll provide you with more information about the
different classes (Author and Description).
The first 2 lines tells the user agent, that we'll be
using the schemas (or vocabularies) from the 2 URL's. The first URL is on our
own server, and contains information about the tags, that we've created and
added to RDF. The next line is the W3C schema, and it contains the tags, that
are recommended by the W3C. The schemas tells the browser what tags are legal,
and what they mean, and therefore they are very important.
After that, you have the actual RDF code starting. The
description line, tells the user agent, that we are describing the document at
http://www.cybercoded.com/some.doc. The author line, tells the browser, that
David Cooley wrote this thing. The last 2 lines closes the RDF, in the same way
you would close HTML-code.
As you'll see in the next chapter, the code is not much
different in Dublin Core.
Dublin Core Example in RDF
One obvious application for RDF is in the description
of web pages. This is one of the basic functions of the Dublin Core (DC)
initiative. The Dublin Core is a set of 15 metadata elements (such as Title,
Subject, Publisher etc.) used to describe resources on the Web. Dublin Core has
gathered experts from the library world and the networking and digital library
research communities. Dublin Core is intended to be usable by non-catalogers as
well as by those with experience with formal resource description models.
Dublin Core is currently being used in many places, and
is one of the foundations that RDF is building on.
I'll now show you an example, that uses the RDF syntax
to encode Dublin Core metadata within HTML documents.
An inline Dublin Core example of this article would
then have the following syntax:
<head>
<xml>
<?namespace href = "http://www.w3.org/schemas/rdf-schema" as = "RDF">
<?namespace href = "http://www.purl.org/RDF/DC/" as = "DC">
<RDF:RDF>
<RDF:Assertion RDF:HREF = "...uri...."
DC:Title = "...value..."
DC:Creator= "...value..."
/>
</RDF:RDF>
</xml>
</head>
|
A real world example would then look like this:
<head>
<xml>
<?namespace href = "http://www.w3.org/schemas/rdf-schema" as = "RDF">
<?namespace href = "http://www.purl.org/RDF/DC/" as = "DC">
<RDF:RDF>
<RDF:Description RDF:HREF="http://purl.org/metadata/dublin_core_elements"
DC:Title = "The RDF article"
DC:Creator = "David Cooley"
DC:Subject = "RDF, metadata, w3c"
DC:Description = "This document tries to give some sort of idea,
of what RDF
has to offer you"
DC:Publisher = "Internet Related Technologies"
DC:Format = "text/html"
DC:Type = "Technical Report"
DC:Language = "en"
DC:Date = "1998-05-05" />
</RDF:RDF>
</xml>
</head>
|
If you wanted to store the above text, in a separate
file, you could do this, and instead include the following piece of code in your
HTML-page:
<head>
<LINK REL="meta" HREF="http://www.cybercoded.com/articles/intro.rtf">
</head>
|
This works in the same way, and has the same advantages
as, external references to css files or JavaScript source files.
The future of RDF
Once the web has been sufficiently
"populated" with rich metadata, what can we expect? First, searching
on the web will become easier as search engines have more information available,
and thus searching can be more focused. Doors will also be opened for automated
software agents to roam the web, looking for information for us or transacting
business on our behalf. The web of today, the vast unstructured mass of
information, may in the future be transformed into something more manageable -
and thus something more useful.
In the future non-PC devices could also benefit, such
as set-top electronic program guides defined in RDF.
The "next-generation" webpage could have XML
and RDF used for data. JavaScript and XSL then displays it with HTML semantics
using the Document Object Model (DOM).
Conclusion
The interest from the large browser vendors gives us
hope, that large scale development of tools which understand about RDF will take
place; this in turn, should lead to the widespread adoption of RDF on the web.
Netscape has announced that RDF will be key part of the
"Aurora" component of their version 5 of Netscape Communicator. This
component is still in beta testing, but was previewed at Seybold '97.
"Aurora" will help users organize and manage all their information,
allowing them to integrate content from all over place, and I think this is a
nice way of illustrating how RDF will change the flow of information.
For Internet content providers or corporate developers
who manage Intranets, using RDF can help provide a simple solution to a complex
problem. For example, a developer will be able to create and deploy a simple RDF
file that indexes a Web site, enabling users to see an entire map of the site.
This could viewed in a customizable Java-applet, that would allow users to
integrate their own information, such as bookmarks, local files and much more.
RDF is also not just limited to providing site maps or
channel definitions. It can be used for any metadata application. Since there's
currently no vocabularies, you could build your sitemap in MCF today, and
convert to RDF tomorrow.
In short RDF has the power to elevate the status of the
web from machine-readable to something we might call machine-understandable,
and also do for applications what HTML did for content.
References
Introduction to RDF Metadata
http://www.w3.org/TR/NOTE-rdf-simple-intro
W3C Resource Description Framework (RDF) Model and
Syntax
http://www.w3.org/TR/WD-rdf-syntax
Frequently Asked Questions about RDF
http://www.w3.org/RDF/FAQ
Netscape RDF press releases
http://home.netscape.com/newsref/pr/newsrelease488.html
http://home.netscape.com/newsref/pr/newsrelease501.html
Dublin Core Metadata
http://purl.oclc.org/metadata/dublin_core/main.html
Articles | Introduction to RDF
| back
|