<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>robdixoniii</title>
	<atom:link href="http://www.robdixoniii.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.robdixoniii.com</link>
	<description>Another voice in the informational cacophony.</description>
	<lastBuildDate>Sat, 21 Apr 2012 03:52:12 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>A Robust Title Casing Algorithm</title>
		<link>http://www.robdixoniii.com/a-robust-title-casing-algorithm/</link>
		<comments>http://www.robdixoniii.com/a-robust-title-casing-algorithm/#comments</comments>
		<pubDate>Thu, 01 Mar 2012 23:33:04 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Algorithm]]></category>
		<category><![CDATA[C#]]></category>

		<guid isPermaLink="false">http://www.robdixoniii.com/?p=467</guid>
		<description><![CDATA[I&#8217;ve been thinking a lot about title casing lately. I&#8217;ve been tagging my huge music library, which exposes me to all the odd variations that are band, song, and album names. Additionally I&#8217;ve recently written an algorithm to support the new Orchard CMS Coverflow module I&#8217;ve been working on. This post outlines the logic of the algorithm [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been thinking a lot about title casing lately. I&#8217;ve been tagging my huge music library, which exposes me to all the odd variations that are band, song, and album names. Additionally I&#8217;ve recently written an algorithm to support the new <a href="http://orchardproject.net/" target="_blank">Orchard CMS</a> Coverflow module I&#8217;ve been working on. This post outlines the logic of the algorithm and includes the C# source code at the bottom.</p>
<h3>The Basic Rules</h3>
<p>In general, title casing means capitalizing the first letter of each word in a string, like this: &#8220;My Important Title.&#8221; Creating an algorithm to do this is trivial, but unfortunately there are exceptions to this rule.</p>
<p>The first is that in English there are a handful of words that we agree to <em>not</em> capitalize in titles.  The list of words that I came up with and included in our PT library are:</p>
<blockquote><p>{ the, of, or, and, an, a, in, is, are, to, on }</p></blockquote>
<p>Unless they are the first <em>or the last</em> word in the title, these words should be lowercased.  See the results for yourself: &#8220;The Lord Of The Rings&#8221; vs. &#8220;The Lord of the Rings.&#8221; Notice the first word, &#8220;the&#8221; is capitalized, but in the middle of the sentence it is lowercased.</p>
<p>The last word is important too. Take the string &#8220;&#8230;and the band played on.&#8221; The correct title casing should be &#8220;&#8230;And the Band Played On.&#8221;  The last &#8220;on&#8221; is capitalized because it is at the end. Contrast this with &#8220;hop on pop,&#8221; which should be cased &#8220;Hop on Pop.&#8221;</p>
<h3>Exceptions for Specifically Cased Words</h3>
<p>The list of words to lowercase isn&#8217;t the only list of special words we need to consider. Perhaps more important in today&#8217;s brand conscious world are casing exceptions. These are quite common, like Apple&#8217;s &#8220;i&#8221; products: iPhone, iPad, iPod, etc. If you title case those names, they become downright unrecognizable: &#8220;Iphone.&#8221; At Planet Telex, we&#8217;ve built websites for DEMOGala, ScriptSave, WellDyneRx, BioClaim, and others who wouldn&#8217;t be happy to have their brand incorrectly cased as all lowercase except for the first letter.</p>
<p>An example from my music library is the band MUTEMATH. The correct branding is all caps. If my algorithm makes it &#8220;Mutemath&#8221; not only is it wrong, its totally lame. The nuances don&#8217;t stop there though- consider the band &#8220;Portugal. The Man.&#8221; Yes, that period is supposed to be there, and the &#8220;The&#8221; should also be capitalized. That is how the band does it, but it is also natural to the English language. We expect a capital letter after a period. If my algorithm generates &#8220;Portugal. the Man&#8221; it is also incorrect and lame.</p>
<p>So a successful casing algorithm needs 2 lists of special words: One to specify words to lowercase when in the middle of the title, and the other to specify words that should be cased specifically, like &#8220;MUTEMATH&#8221; and &#8220;iPhone&#8221;.</p>
<h3>Nuances in Punctuation</h3>
<p>A robust title casing algorithm needs to be aware of which symbols that separate words should trigger exceptions to the general lowercase rule, like &#8220;Portugal. The Man&#8221; or &#8220;Pinion/Terrible Lie&#8221; (which could produce &#8220;Pinion/terrible Lie&#8221; in an algorithm that didn&#8217;t respect the &#8220;/&#8221; character).</p>
<p>To surmount this complexity, I&#8217;ve created 2 lists of characters that separate words, a list of &#8220;weak&#8221; separators and a list of &#8220;strong&#8221; separators. As their name implies, all of these characters can be seen as flags that separate one word from another, the difference is that after a &#8220;strong&#8221; separator, the following word should be capitalized, even if it is in the lowercase list.</p>
<p>The two weak separators are the space and comma. There are more strong ones:</p>
<blockquote><p>{ . ? ! ( ) { } [ ] &lt; &gt; / \ &amp; }</p></blockquote>
<h3> Algorithm Overview</h3>
<p>With the assistance of the lists I&#8217;ve defined as well as a few helper methods, the basic algorithm iterates over each character, building words and then adding them when separators are encountered. A separate function handles applying the rules of casing to a single word, the iterator function simply has to control it.</p>
<p>The biggest complexity is dealing with the possible variations in punctuation. The least obvious rule, which has several lines of explanation in my example, is that if a strong separator is encountered, spaces must be discounted until the next word is written. This way, the word &#8220;and&#8221; following both a &#8220;)&#8221; character and then a space character is correctly uppercased.</p>
<h3>The Code</h3>
<p>The following code is a slightly revised version of the code included in the <a href="https://github.com/PlanetTelexInc/dotnet-pt-library" target="_blank">Planet Telex .Net Library</a>. Some formatting is changed to better fit on the page, and the class name has been contrived for this example. Download or fork the source code at our <a href="https://github.com/PlanetTelexInc" target="_blank">Planet Telex GitHub account</a>.</p><pre class="crayon-plain-tag"><code>using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using PlanetTelex.Properties;

namespace PlanetTelex.Utilities
{
  /// &lt;summary&gt;
  /// This class demonstrates a robust title casing algorithm.
  /// &lt;/summary&gt;
  public class TitleCaseUtility
  {
    private enum WordPosition { First, Middle, Last }
    private char[] _weakSeparators = new[] { ' ', ',' };
    private char[] _strongSeparators = 
        new[] {'.','?','!','(',')','{','}','[',']','&lt;','&gt;','/','&amp;'};

    private IEnumerable&lt;char&gt; AllSeparators
    {
        get { return _weakSeparators.Concat(_strongSeparators); }
    }

    /// &lt;summary&gt;
    /// This is the list of specifically cased words embedded in assembly resources.
    /// &lt;/summary&gt;
    private static IEnumerable&lt;string&gt; ToCase
    {
        get
        {
            if (_toCase == null)
                _toCase = Resources.TitleCaseToCase.Split(',', ' ');

            return _toCase;
        }
    }
    private static string[] _toCase;

    /// &lt;summary&gt;
    /// This is the list of words to lowercase embedded in assembly resources.
    /// &lt;/summary&gt;
    private static IEnumerable&lt;string&gt; ToLower
    {
        get
        {
            if (_toLower == null)
                _toLower = Resources.TitleCaseToLower.Split(',', ' ');

            return _toLower;
        }
    }
    private static string[] _toLower;

    /// &lt;summary&gt;
    /// This helper method uppercases the first letter of any given string.
    /// &lt;/summary&gt;
    public string UppercaseFirstLetter(string toUppercase)
    {
        if (string.IsNullOrEmpty(toUppercase))
            return string.Empty;

        return char.ToUpper(toUppercase[0]) + toUppercase.Substring(1);
    }

    /// &lt;summary&gt;
    /// This helper method applies casing rules to a single word.
    /// &lt;/summary&gt;
    private string CaseWord(string wordToCase, WordPosition wordPosition, 
        char preceedingSeparator, string[] casedWords)
    {
        // If the word is in our embedded specifically cased list, return that casing.
        if (ToCase.Contains(wordToCase, StringComparer.OrdinalIgnoreCase))
          return ToCase.FirstOrDefault(
            s =&gt; s.Equals(wordToCase, StringComparison.OrdinalIgnoreCase));

        // If the word is in the provided specifically cased list, return that casing.
        if (casedWords != null &amp;&amp; 
          casedWords.Contains(wordToCase, StringComparer.OrdinalIgnoreCase))
          return casedWords.FirstOrDefault(
            s =&gt; s.Equals(wordToCase, StringComparison.OrdinalIgnoreCase));

        // If the word is in our embedded list to lowercase, in the middle of the title, 
        // and not after a strong separator, it should be lowercased.
        if (ToLower.Contains(wordToCase, StringComparer.OrdinalIgnoreCase) &amp;&amp; 
          wordPosition == WordPosition.Middle &amp;&amp; 
          _weakSeparators.Contains(preceedingSeparator))
          return wordToCase.ToLower();

        // The default casing uppercases the first letter and lowercases the rest.
        return UppercaseFirstLetter(wordToCase.ToLower());
    }

    /// &lt;summary&gt;
    /// Replaces a section of a string. This method will help us with a fringe case.
    /// &lt;/summary&gt;
    public string ReplaceAt(string toReplaceAt, int removeStartIndex, 
        int removeCount, string toInsert)
    {
        // Argument validation.
        if (toReplaceAt == null) 
            throw new ArgumentNullException(&quot;toReplaceAt&quot;);
        if (removeStartIndex &gt;= toReplaceAt.Length) 
            throw new ArgumentOutOfRangeException(&quot;removeStartIndex&quot;);
        if (removeStartIndex + removeCount &gt;= toReplaceAt.Length) 
            throw new ArgumentOutOfRangeException(&quot;removeCount&quot;, 
                Resources.IndexPlusCountExceedsSize);

        // Remove and insert.
        string removed = toReplaceAt.Remove(removeStartIndex, removeCount);
        return removed.Insert(removeStartIndex, toInsert);
    }

    /// &lt;summary&gt;
    /// The main title casing algorithm.
    /// &lt;/summary&gt;
    public string TitleCase(string toTitleCase, string[] casedWords)
    {
        if (toTitleCase == null)
            return null;

        StringBuilder stringBuilder = new StringBuilder();
        string currentWord = string.Empty;
        string lastWord = string.Empty;
        char lastSeparator = '\0';
        int wordCount = 0;

        foreach (char c in toTitleCase)
        {
            if (AllSeparators.Contains(c)) // The current character is a separator.
            {
                if (currentWord.Length &gt; 0)
                {
                  WordPosition position = wordCount == 0 ? 
                    WordPosition.First : WordPosition.Middle;
                  stringBuilder.Append(
                    CaseWord(currentWord, position, lastSeparator, casedWords));
                  lastWord = currentWord;
                  currentWord = string.Empty;
                  lastSeparator = '\0';
                  wordCount++;
                }
                stringBuilder.Append(c);
                // Set lastSeparator to the current character, unless it is a space AND
                // the lastSeparator is a strong separator. This is so CaseWord will 
                // work correctly after strong and space separators happen in succession.
                if (!(_strongSeparators.Contains(lastSeparator) &amp;&amp; char.IsWhiteSpace(c)))
                    lastSeparator = c;
            }
            else // The current character is not a separator.
                currentWord += c;
        }

        if (currentWord.Length &gt; 0) // Add the last word.
            stringBuilder.Append(
                CaseWord(currentWord, WordPosition.Last, lastSeparator, casedWords));
        else // Add the last word when the last character was a separator.
        {
            string title = stringBuilder.ToString();
            int lastWordIndex = 
                title.LastIndexOf(lastWord, StringComparison.OrdinalIgnoreCase);
            string toInsert = CaseWord(lastWord, WordPosition.Last, '\0', casedWords);
            return ReplaceAt(title, lastWordIndex, lastWord.Length, toInsert);
        }
        return stringBuilder.ToString();
    }
  }
}</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://www.robdixoniii.com/a-robust-title-casing-algorithm/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>XML Namespaces</title>
		<link>http://www.robdixoniii.com/xml-namespaces/</link>
		<comments>http://www.robdixoniii.com/xml-namespaces/#comments</comments>
		<pubDate>Mon, 23 Jan 2012 16:15:41 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Namespaces]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.robdixoniii.com/?p=436</guid>
		<description><![CDATA[On this blog, I&#8217;ve been writing about the semantic web and its technologies. This began with a primer on URIs, and continued with an introduction to RDF. The next step in that journey will be to explore RDF/XML: the XML representation of RDF. But first, it is important to take stock of a very important [...]]]></description>
			<content:encoded><![CDATA[<p>On this blog, I&#8217;ve been writing about the <a href="http://www.robdixoniii.com/the-semantic-web-what-and-why/">semantic web and its technologies</a>. This began with <a href="http://www.robdixoniii.com/anatomy-of-a-uri/">a primer on URIs</a>, and continued with <a href="http://www.robdixoniii.com/introduction-to-rdf/">an introduction to RDF</a>. The next step in that journey will be to explore RDF/XML: the XML representation of RDF. But first, it is important to take stock of a very important feature in XML, XML namespaces. XML namespaces are quite useful outside of the context of the semantic web too, since really all they are is a way to avoid naming conflicts, but RDF/XML wouldn&#8217;t really be possible without namespaces, so we should take a moment to appreciate these often overlooked data structures.</p>
<p>An XML namespace maps some URI to a local name you use in your XML document when defining elements.  In <a href="http://www.w3.org/TR/REC-rdf-syntax/" target="_blank">RDF/XML</a>, they might refer to a particular URL where a vocabulary has been published, but that isn&#8217;t a requirement. It really just needs to be some valid URI- whether it is a URL or URN doesn&#8217;t matter, since the parser doesn&#8217;t look up that link.  To define this mapping, simply put the &#8220;xmlns&#8221; attribute on any XML node using the namespace you are defining. Namespaces are often declared in the document&#8217;s root element so they can apply to the whole document. This common practice leads to the misconception that xmlns attributes <em>must</em> be in the root, but that isn&#8217;t the case, they can be placed in any node in your XML document.</p>
<h3>Namespace Syntax</h3>
<p>The basic syntax for this attribute is:</p>
<blockquote><p>xmlns:<em>localName</em> = &#8220;http://www.myuri.com/path&#8221;</p></blockquote>
<p>Once that is done, whenever you define an element in your XML doc, you use the local name prefix on it, like so:</p>
<blockquote><p>&lt;<em>localName</em>:myNode&gt;&lt;/<em>localName</em>:myNode&gt;</p></blockquote>
<p>Since that can get verbose, you will find in practice that the local name is often only one or two characters long.</p>
<p>It is possible to specify a namespace in an XML document and not add the <em>localName</em> portion when you define it, like so:</p>
<blockquote><p>xmlns = &#8220;http://www.myuri.com/path&#8221;</p></blockquote>
<p>This is now the default namespace for everything under it.  Other namespaces can still be used, but when a <em>localName</em> isn&#8217;t specified that is the assumed namespace for all elements that don&#8217;t explicitly declare the local name.</p>
<p>That&#8217;s all there is to it.  Now your XML document can have element names that don&#8217;t collide. When combined with the semantic web notion of vocabularies, and using XML to express RDF, namespaces provide an especially elegant solution. You can create an RDF/XML document that draws from many different vocabularies and not worry about confusion because each vocabulary has its own namespace.</p>
<h3>An Example</h3>
<p>I&#8217;ll finish up this post with a simple XML namespace example published by the <a href="http://www.w3schools.com/xml/xml_namespaces.asp" target="_blank">W3 Schools</a>.</p>
<pre class="crayon-plain-tag"><code>&lt;root
xmlns:h=&quot;http://www.w3.org/TR/html4/&quot;
xmlns:f=&quot;http://www.w3schools.com/furniture&quot;&gt;

&lt;h:table&gt;
  &lt;h:tr&gt;
    &lt;h:td&gt;Apples&lt;/h:td&gt;
    &lt;h:td&gt;Bananas&lt;/h:td&gt;
  &lt;/h:tr&gt;
&lt;/h:table&gt;

&lt;f:table&gt;
  &lt;f:name&gt;African Coffee Table&lt;/f:name&gt;
  &lt;f:width&gt;80&lt;/f:width&gt;
  &lt;f:length&gt;120&lt;/f:length&gt;
&lt;/f:table&gt;

&lt;/root&gt;</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://www.robdixoniii.com/xml-namespaces/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Plastic Everywhere</title>
		<link>http://www.robdixoniii.com/plastic-everywhere/</link>
		<comments>http://www.robdixoniii.com/plastic-everywhere/#comments</comments>
		<pubDate>Tue, 03 Jan 2012 15:01:23 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Environment]]></category>
		<category><![CDATA[global climate change]]></category>
		<category><![CDATA[plastic]]></category>
		<category><![CDATA[waste]]></category>

		<guid isPermaLink="false">http://www.robdixoniii.com/?p=428</guid>
		<description><![CDATA[I admit, I had no personal epiphany. Last year my business partner Dan and his wife Alissa had a sort of realization and pointed out how much plastic is in our modern lives. Look around you right now, plastic abounds. If you are like me and born into the plastic age, your first thought might [...]]]></description>
			<content:encoded><![CDATA[<p>I admit, I had no personal epiphany. Last year my business partner <a href="https://twitter.com/#!/dansomething" target="_blank">Dan</a> and his wife <a href="http://www.alissahansen.com/" target="_blank">Alissa</a> had a sort of realization and pointed out how much plastic is in our modern lives. Look around you right now, plastic abounds. If you are like me and born into the plastic age, your first thought might be &#8220;so what?&#8221;  But it turns out that our use and overuse of plastic is indeed a problem. It is made from fossil fuels, stays around forever, and can cause harmful health effects on humans and other animals (aka the ecosystem).</p>
<p>If you care about <a href="http://en.wikipedia.org/wiki/Climate_change" target="_blank">global climate change</a>, reducing fossil fuels should be a priority- so add plastics to your list. But even worse than the resources it takes to produce plastic, is the time it takes for that plastic to break back down. In human terms, this is never. All the plastic stuff you have sitting around you right now will likely still be in that same form (at a molecular level) when you die, and will remain intact throughout your children&#8217;s and grandchildren&#8217;s lives.  Where does all this go? Landfills and the <a href="http://en.wikipedia.org/wiki/Great_Pacific_Garbage_Patch" target="_blank">Great Pacific Garbage Patch</a>.  When it is sitting in the ocean, that plastic breaks down into polymers and poisons the ecosystem.</p>
<p>The thing I&#8217;m finding so alarming about plastic is the lack of awareness or concern for this giant problem.  Sure we have a lot of ecological challenges, but there are lots of people working on the energy production problem or wildlife conservation. I don&#8217;t know any professional organization dedicated to this plastics problem, and manufacturers just keep cranking out plastic goods- then packaging them in plastic!</p>
<p>What can we do? First is to simply bring awareness to the issue.  Once you can see the plastic surrounding you and think of the implications of that, you might change some of your buying habits. <a href="http://www.alissahansen.com/" target="_blank">Alissa Hansen</a> has been able to take this to an extreme- the Hansen family consumes almost no plastics, which is an interesting and difficult experiment.  Just try to not consume plastic all day.  The difficulty of doing that will illustrate just how pervasive plastic is, and the difficulty in trying to live &#8220;plastic free&#8221;.</p>
<p>Recycling you say?  Sadly, few plastic products can be fully recycled, something they don&#8217;t often tell you. I&#8217;ve been recycling for years and thought that was good enough, but it turns out that only types 1 and 2 can be recycled, and even those have some issues. You may have seen the newer, plant based plastics- Starbucks uses it for Frappacino cups. It can biodegrade, which makes it much, much better than petroleum based plastics. That is the type of innovation we&#8217;ll need to reduce the giant island of trash in our ocean.</p>
<p>But yes, ultimately I too am a hypocrite- we have plenty of plastics around my house. Just taking a shower this morning, I could count 3 plastic bottles around me. We still buy orange juice (if you don&#8217;t want to squeeze it yourself, you pretty much can&#8217;t find non-plastic containers to buy OJ in). But you know what? I- you- the world- can&#8217;t change all at once.  Like it or not, plastics and cheap disposable plastic goods and packaging are part of our infrastructure. What we need to do is change that infrastructure- something that can only be done though time, perseverance, and consensus.  I want manufacturers to use less plastic, finding eco-friendly alternatives. I don&#8217;t have the time to make everything from scratch and remove myself from the supermarket economy. Most of us don&#8217;t.</p>
<p>So then, what <em>can</em> you do about it? Here are a few ways you can help start the reversal of this ecological calamity:</p>
<ol>
<li>Use all plastics you already own to their full extent. Ideally, items like Tupperware can last your entire life.  Other items, like resealable plastic bags can be used at least several times before they need to be tossed.</li>
<li>Look for alternative packaging.  While we still buy OJ, there are other products like Peanut Butter, where you can reward companies that use glass over plastic. If everyone did this, manufacturers would certainly get the message and use paper, glass, and metals instead of plastics.</li>
<li>Spread awareness.  Even if you don&#8217;t blog, you can bring up the problem with family and friends. I&#8217;ve done this within my own family, and like me, they see the world a little differently after bringing consciousness to the plastic abounding around them.</li>
<li>Recycle. Yeah, I know its not perfect, but it is better than nothing. You should be recycling all your glass, tin, and plastics. I send all plastics to the recycling plant, and let them sort it out. Meaning, I don&#8217;t analyze the plastic number (1 and 2 can be recycled, but you will find all sorts of numbers of plastic). I figure at the very least I&#8217;m putting pressure on recycling plants to push for more 1 and 2 plastics.</li>
<li>All the little things.  Like, we bring bags to the grocery store- and even if we didn&#8217;t we&#8217;d opt for paper over plastic. When you go out to eat and order a soda- ask the waiter to not bring that plastic straw. Or if you don&#8217;t finish your meal, see if there is a non-plastic (including Styrofoam) box for the leftovers. Even better, bring your own container. Opportunities to reduce and eliminate plastic use happen all the time, it just need to occur to you to do it.</li>
</ol>
<p>Looking around me now, I can see so many plastic products- my keyboard, the mouse, the computer tower, my headphones, my reusable <a href="http://www.starbucks.com/" target="_blank">Starbucks</a> cup, even some decorative <a href="http://en.wikipedia.org/wiki/R2D2" target="_blank">R2D2</a> Pez dispensers. All that and more in 3 square feet of office.  All of that is a problem, I admit, but for now I&#8217;m going to take care of these items and use them until they cease to function. In the meantime, I consider disposable plastic packaging enemy #1. That is the first beast of the plastic horde we must slay. But to do that, we need more people to join the battle.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.robdixoniii.com/plastic-everywhere/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Introduction to RDF</title>
		<link>http://www.robdixoniii.com/introduction-to-rdf/</link>
		<comments>http://www.robdixoniii.com/introduction-to-rdf/#comments</comments>
		<pubDate>Wed, 14 Dec 2011 22:25:13 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[RDF]]></category>

		<guid isPermaLink="false">http://www.robdixoniii.com/?p=393</guid>
		<description><![CDATA[The Resource Description Framework (RDF) is the most fundamental of the semantic web technologies.  It is a way of structuring data that is expressive, precise, but also very flexible. Those familiar with relational database design can think of RDF as the ultimate data normalization scheme- representing data with a structure derived from human languages, but [...]]]></description>
			<content:encoded><![CDATA[<p>The Resource Description Framework (RDF) is the most fundamental of the semantic web technologies.  It is a way of structuring data that is expressive, precise, but also very flexible. Those familiar with relational database design can think of <a href="http://en.wikipedia.org/wiki/RDF" target="_blank">RDF</a> as the ultimate <a href="http://en.wikipedia.org/wiki/Data_normalization" target="_blank">data normalization</a> scheme- representing data with a structure derived from human languages, but structured more rigidly so machines can interpret it.</p>
<p>Like many powerful concepts, RDF is fundamentally simple but can get quite complex in implementation. Many available articles focus on these intricacies and complexities. My intention in this article is to clearly and simply describe RDF in its most basic and abstract form.</p>
<p>First, consider a simple statement in English like, &#8220;This article&#8217;s author is Rob Dixon.&#8221; If you remember high-school English class, you might recall that sentences have a <em>subject </em>and a <em>predicate. </em>The subject is what the sentence is about- in my example that is &#8220;this article&#8221;. The predicate is the information about the subject, so in this case &#8220;author is Rob Dixon.&#8221;  The word &#8220;author&#8221; is called the <em>simple predicate (i</em>n RDF, this word is actually called the &#8220;predicate&#8221;, so don&#8217;t get confused by the terminology).  The last part of the sentence, my name, is the sentence&#8217;s <em>object</em>- that which receives the action of a verb. In my sentence the implied verb is &#8220;authored by,&#8221; so the object is me.</p>
<p>So we&#8217;ve taken the basic sentence, &#8220;This article&#8217;s author is Rob Dixon,&#8221; and have broken it down into three parts:</p>
<p>{<em>subject, predicate, object</em>}</p>
<p>This is an RDF triple.  The example sentence as an RDF triple is:</p>
<p>{<em>this article, author, Rob Dixon</em>}<em> </em></p>
<p><em></em>The information in the sentence can be expounded upon, and all you need to do is string together multiple triples.  So, &#8220;This article&#8217;s author is Rob Dixon, who lives in Denver&#8221; can become two triples: {<em>this article, author, Rob Dixon</em>} and {<em>Rob Dixon, lives in, Denver</em>}.</p>
<p>Pretty simple, right? It is, but you can imagine that once you start stringing many of these triples together they can form complex structures. That is why one of the primary ways RDF is visualized is with <a href="http://www.w3.org/TR/rdf-concepts/#section-data-model" target="_blank">a graph</a>, where subjects and objects are represented as shapes with the predicate displayed as a line connecting the two.</p>
<p>Another complexity is actually identifying what the subject, predicate, and object actually <em>mean</em>, without ambiguity.  In my example, I used the words &#8220;this article&#8221; to make it human readable, but that is in fact a poor way to identify the article.  It is much better to use the article&#8217;s URL: <a href="http://www.robdixoniii.com/introduction-to-rdf">http://www.robdixoniii.com/introduction-to-rdf</a>. A more useful RDF triple is then:</p>
<p>{<em>http://www.robdixoniii.com/introduction-to-rdf, author, Rob Dixon</em>}.</p>
<p>But when you think about it, you&#8217;ll realize that the other components of the triple have the same problem. What exactly do I mean by <em>author</em>? Which <em>Rob Dixon</em>? After all, there are many in this world. For this reason, every component of the triple can be a <em>resource</em> specified by a URI, like the URL of this article. In fact, the predicate must be a URI according the specification.  This is important, because we all know that English words have different meanings when used in different contexts. This set of resources used by some RDF graph is called a <em>vocabulary</em>.</p>
<p>Since it would be a hassle to always have to define your vocabulary, there are predefined vocabularies published by a variety of sources. One popular vocabulary is the <a href="http://dublincore.org" target="_blank">Dublin Core</a>. It defines many common concepts, like authorship, so you don&#8217;t have to. Technically, you don&#8217;t really have to define your vocabulary, just make it a URI.  So I could say my predicate &#8220;author&#8221; is &#8220;<em>http://www.robdixoniii.com#author</em>&#8220;. The fact that that URI doesn&#8217;t resolve to a resource doesn&#8217;t impact the validity of the RDF.</p>
<p>When completely converted into resources, the sentence &#8220;This article&#8217;s author is Rob Dixon,&#8221; turns into the RDF triple:</p>
<p>{<em>http://www.robdixoniii.com/introduction-to-rdf, <em>http://www.robdixoniii.com#author</em>, <em><em>http://www.robdixoniii.com</em></em></em>}</p>
<p>Now we are back to a technical looking construct that doesn&#8217;t look so &#8220;<a href="http://en.wikipedia.org/wiki/Semantic" target="_blank">semantic</a>&#8221; anymore. It is though, just in a way a computer can interpret. There are ways to express RDF in more human readable formats and there are shorthand notations so you don&#8217;t see the full URI of each item in a triple, but I don&#8217;t want to go into written notations of RDF here- just know there are many.</p>
<p>At its core, there are four simple rules that define RDF:</p>
<ol>
<li>An RDF triple is a <em>subject</em>, <em>predicate</em>, and <em>object</em>, in that order.</li>
<li>An RDF triple represents a complete and unique fact (in the formal logic sense, but practically the reason why URIs are part of the specification).</li>
<li>An RDF triple can be combined with other RDF triples, however that operation doesn&#8217;t change the meaning of any individual triple.</li>
<li>The <em>subject </em>component of the triple must be either a blank node or a URI. The <em>predicate</em> must be a URI, and the <em>object</em> can be a blank node, a URI, or a literal value.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.robdixoniii.com/introduction-to-rdf/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Semantic Web: What and Why</title>
		<link>http://www.robdixoniii.com/the-semantic-web-what-and-why/</link>
		<comments>http://www.robdixoniii.com/the-semantic-web-what-and-why/#comments</comments>
		<pubDate>Tue, 08 Nov 2011 23:42:00 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[RDF]]></category>
		<category><![CDATA[Tim Berners-Lee]]></category>
		<category><![CDATA[Wikipedia]]></category>

		<guid isPermaLink="false">http://www.robdixoniii.com/?p=396</guid>
		<description><![CDATA[Sir Tim Berners-Lee coined the term &#8220;semantic web&#8221; and defined it as &#8220;a web of data that can be processed directly and indirectly by machines.&#8221; However this statement both precise and vague, and probably the reason why its definition in Wikipedia is scattered and rambling. Since I&#8217;ve started studying the subject I&#8217;ve found myself trying to explain [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Sir_Tim_Berners-Lee" target="_blank">Sir Tim Berners-Lee</a> coined the term &#8220;semantic web&#8221; and defined it as &#8220;a web of data that can be processed directly and indirectly by machines.&#8221; However this statement both precise and vague, and probably the reason why <a href="http://en.wikipedia.org/wiki/Semantic_web" target="_blank">its definition in Wikipedia</a> is scattered and rambling. Since I&#8217;ve started studying the subject I&#8217;ve found myself trying to explain it to various people, usually with limited success.</p>
<p>A good way to practically define it, if not exactly explain it, is by listing specific technologies that fall under the semantic web moniker: <a href="http://en.wikipedia.org/wiki/RDF" target="_blank">RDF</a>, <a href="http://en.wikipedia.org/wiki/Web_Ontology_Language" target="_blank">OWL</a>, <a href="http://en.wikipedia.org/wiki/SPARQL" target="_blank">SPARQL</a>, <a href="http://en.wikipedia.org/wiki/RDFS" target="_blank">RDFS</a>, <a href="http://en.wikipedia.org/wiki/XML" target="_blank">XML</a>.  These are among the core technologies that facilitate the semantic web.  Frameworks and specifications that enable machines to process a web of data.  Understanding these technologies is akin to understanding just what the semantic web is, but it still doesn&#8217;t <em>explain</em> what the semantic web is, especially to a non-technical person.  If these technologies are to fulfill their full potential, non-technical people need to understand what the heck this semantic web is all about. Investors, business owners, managers, and even hobbyists could get great value out of this technology, but they won&#8217;t invest time or resources in a concept the don&#8217;t understand.  Having defined it twice already, I think the question still remains: what is the semantic web?</p>
<p>Buried in the <a href="http://en.wikipedia.org/wiki/RDF" target="_blank">Wikipedia entry</a> is a good description: &#8220;The semantic web is a vision of information that can be readily interpreted by machines, so machines can perform more of the tedious work involved in finding, combining, and acting upon information on the web.&#8221; Terrabytes of data are published on the web daily, but the vast majority of it is configured for human consumption.  Machines can read the content, but they can&#8217;t relate it to other content or put it in context- at least without a human writing a program that does those things.</p>
<p>If you are a software developer, you are likely familiar with <a href="http://en.wikipedia.org/wiki/Relational_databases" target="_blank">relational databases</a>.  That&#8217;s because most modern business applications store their data in a relational database. The reason is that they provide awesome data fidelity- the program using the data can &#8220;understand&#8221; how items of data relate. If I&#8217;m building a music distribution application, I likely have individual items of data to represent a song, an artist, and an album. With a relational data store, my application can know that an <em>album</em> is a collection of <em>songs</em> and that an <em>album </em>is written by an <em>artist</em>.</p>
<p>The prevalence of relational databases in application development highlights the fact that not only is the <em>data</em> important, but its relationship to other data is just as important.  They work extremely well, but are limited- they model data for a particular <em>domain</em>, namely the domain of the application being written (in the example above, music albums).  This means that for my application to use any data, it must be imported into the relational database, which has a schema unique to the application, so someone must write an explicit mapping from an external source into the custom database.</p>
<p>Web services that return data are another popular data source, and they are typically interfaced using an<a href="http://en.wikipedia.org/wiki/API" target="_blank"> application programming interface (API)</a>. At first it might seem that web based APIs fulfill the description of the semantic web as defined by <a href="http://en.wikipedia.org/wiki/Tim_Berners-Lee" target="_blank">Berners-Lee</a>, and some of them indeed might, but it is generally not the case. Why? Because just like with the relational database, an explicit mapping will be created by a developer to interface with the API for their application.  Also, APIs are proprietary interfaces which return data in a format defined by whomever wrote it.  For machines to truly act &#8220;intelligently&#8221; data needs to be presented in a uniform manner, regardless of technology or application domain.</p>
<p>Another key aspect of the semantic web, possibly the most important, is the <em>web</em> part.  Both a relational database and a web-based API represent endpoints to closed, non-distributed systems. What we are talking about with the semantic web are applications that have the entire internet as a data source, and can make intelligent connections between that data, even though the application has never been programmed with the relationship explicitly. In short, we&#8217;re talking about building applications that can use the web like a human can.</p>
<p>Let&#8217;s say I wanted to create a web-based application that is the ultimate music research system.  I want it to know every album published, the credits for every track, all the artwork, and even song lyrics.  One approach I could take is to create a relational database and attempt to fill it up with all this data.  New music is published all the time and from a variety of sources, how can my application keep up? Furthermore, how does the data get from the Internet into my relational database? Traditionally, some sort of connector tool or importer tool is written for this purpose.  But new sites and data sources are created all the time, and creating a custom import mapping for each is a whole lot of work, and limits the data sources my application can use.</p>
<p>A semantic web application by contrast, could simply be told to use <a href="http://allmusic.com/" target="_blank">allmusic.com</a>, <a href="http://www.amazon.com/" target="_blank">amazon.com</a>, <a href="http://www.wikipedia.org/" target="_blank">wikipedia.org</a>, <a href="http://www.apple.com/itunes/?cid=OAS-US-DOMAINS-itunes.com" target="_blank">iTunes</a>, <a href="http://www.songlyrics.com/" target="_blank">songlyrics.com</a>, and many other systems known to have some authority on the subject of music and song publishing.  I could ask it, &#8220;What albums did <a href="http://allmusic.com/artist/fleetwood-mac-p4273" target="_blank">Fleetwood Mac</a> release in 1969?&#8221; and it would semantically connect the question to the data provided on those sites, aggregate it, and give me an answer.  When I see &#8220;English Rose&#8221; in the result set, I could then ask, &#8220;what is track 7 and who wrote it?&#8221;  Back to those data sources to come tell me: &#8220;<a href="http://en.wikipedia.org/wiki/Black_Magic_Woman">Black Magic Woman</a>&#8221; and &#8220;<a href="http://en.wikipedia.org/wiki/Peter_Green_(musician)" target="_blank">Peter Green</a>&#8220;. You could keep going: &#8220;What are the lyrics to that?&#8221;, &#8220;Where was Peter Green born?&#8221;, etc. Unlike my relational database, which would certainly have boundaries (I doubt it would contain the birth place of every band member), the semantic web application can keep going, relating question after question, as long as the data sources it uses contains the data and relationships.  And many prominent sites, like <a href="http://www.wikipedia.org/" target="_blank">Wikipedia</a> publish their content in semantic formats and do contain all of this data.</p>
<p>The notion of an application understanding relationships between things without a developer explicitly defining them is an odd one to those of us used to developing traditional software systems, and exactly how that occurs is the secret sauce provided by the technologies I mentioned earlier: <a href="http://en.wikipedia.org/wiki/RDF" target="_blank">RDF</a>, <a href="http://en.wikipedia.org/wiki/SPARQL" target="_blank">SPARQL</a>, <a href="http://en.wikipedia.org/wiki/Web_Ontology_Language" target="_blank">OWL</a>.  However, it can be summarized in a word: <a href="http://en.wikipedia.org/wiki/Metadata" target="_blank">metadata</a>.</p>
<p>While a lot of progress has been made building structures semantic applications understand out of plain text (text to ontology), the ultimate promise of a semantic web does require some publishing guidelines. Publishers will ideally publish <a href="http://en.wikipedia.org/wiki/RDF/XML" target="_blank">RDF/XML</a> in addition to their human readable content. Detractors of the technology view this as a difficult barrier to widespread adoption, but I would point to protocols like <a href="http://en.wikipedia.org/wiki/RSS" target="_blank">RSS</a> or <a href="http://en.wikipedia.org/wiki/ATOM" target="_blank">ATOM</a>.  Ultimately, publishing <a href="http://en.wikipedia.org/wiki/RDF/XML" target="_blank">RDF/XML</a> should be just as easy and ubiquitous as those syndication feeds. Any developer wanting to do so already has <a href="https://www.google.com/search?rlz=1C1CHFX_enUS438US438&amp;gcx=w&amp;sourceid=chrome&amp;ie=UTF-8&amp;q=semantic+web+tools" target="_blank">a wealth of tools at their disposal</a>.</p>
<p>Although there are major content providers (like Wikipedia) that do publish data in a semantic web format like RDF/XML, it remains to be seen if these technologies will be universally adopted. One thing is for certain though: our current World Wide Web is dumb.  Its full of text, images, and videos that have no context or relationship to each other. Humans can sift through this data and make sense of it, machines can&#8217;t.  <a href="https://www.google.com" target="_blank">Google</a> (and <a href="http://www.bing.com/" target="_blank">Bing</a>) are about the closest thing there is, but even those amazing systems don&#8217;t make the Internet semantic.  As good as their algorithms are, the relationships they create between content on the web is imprecise.  It must be, because its the search indexing algorithm that is defining relationships.  In the semantic web, the content publisher defines them, in a very precise way.  Google at its best can only make assumptions based on the content of the page, the links in the page, and links to the page.  It would be much better if the publisher explicitly defined how their published data is related.  Then think of how accurate those search engines could be.</p>
<p>When people think about the future of computers, I think <a href="http://en.wikipedia.org/wiki/Star_Trek" target="_blank">Star Trek</a> comes to mind. In that universe, people can ask a computer any arbitrary question and get back an <em>answer</em> (or at least a request for disambiguation or clarification), not a giant list of possible places an answer might be, which is the state of our current web.  Furthermore, I can give that computer any arbitrary instruction like &#8220;make dinner reservations for me at the best French restaurant in town on Thursday sometime after 7PM&#8221; and it could do exactly that.</p>
<p>That future is not too far off.  Transcribing speech is something that even your smartphone can do well now.  Programs can parse that transcription into an <a href="http://en.wikipedia.org/wiki/Ontology_(information_science)">ontology</a>- a specific vocabulary in a context. Combining these with semantic web technologies to achieve those futuristic results is ultimately what we are trying to do.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.robdixoniii.com/the-semantic-web-what-and-why/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

