Scott McMullan: Google/Internet Archive, Meet Mr. Event

Friday, December 17, 2004

Google/Internet Archive, Meet Mr. Event

In the spirit of "tags are the new black," here's a vision for how event and calendar management should be handled by 2006: Wouldn't it be grand if all events in the world, from a garage sale in Lexington to a tech conference in SF, could be automatically discovered (Google), stored in one central, public domain, web services accessible database (Internet Archive), where the events could then be categorized (Topix), community rated and recommended (Amazon/Netflix/Last.FM), personally tagged (Flickr/del.icio.us), and ultimately custom-fed (PubSub) directly to your calendar device of choice (Calconnect)?

Cool, where do I sign up? I Could have used this to promote our now defunct SF Web Services SIG, which was done "by hand," one site/email list at a time. Not to mention the relative difficulty when you go searching for a good event in your area...(not to knock workit.com or craigslist.org -- thanks guys!) But I want this uber event service now!

So past and current pains combined with threads followed, dots connected, and small wheels turned since last week's Berkeley Calendar Project post create this mashup event service vision. Actually, people are talking about this vision in so many words. Dr. Bob Glushko and Allison Bloodworth et al are leading part of the effort by trying to bring sanity and sharing to the 80+ event calendars of UC Berkeley. Tantek Çelik and other folks working on hCalendar talk about using your web site as your API,

"...bloggers can discuss events in their blog(s) in such a way that spiders
and other aggregators can retrieve such events, automatically convert
them to iCalendar, and use them in any iCalendar application or service.",

Kevin Hughs at zLab writes,

"The potential for applications that make use of networked, time-driven information is
huge. Today's portals have no concept of event personalization or collaboration.
Today's applications have only the most basic concept of integrating with or subscribing
to time-driven data. And there are no providers of horizontal event-based services. ... Software and
the Internet has freed online music from its proprietary data and application jails, why
not do the same with events? The traditional calendar interface deserves a overhaul.",

Marc Canter chimes in,

"I wonder if they wanna help put up shared XML servers of Events - scraped from throughout the web?",

the Calconnect.org folks proclaim,

"Our members’ intent is to enable calendaring and scheduling tools and applications to enter the mainstream of computing," said Dave Thewlis. "After email, the World Wide Web, and instant messaging, calendaring and scheduling capabilities are what business people and consumers will really care about.",

and many many others I have yet to discover [add more in comments]. Yes, A few people are already doing similar stuff, like upcoming.org, rsscalendar.com, openeventscom, whizspark.com, evdb.com(?), and most likely Google Labs (?) [any more?]. But for starters, from what I can glean these efforts rely on people coming and manually submitting events, rather than auto-discovery and aggregation via spidering --> problem of db critical mass.

So let's make one! Let's combine something practical and "standards-based" (ie., iCalendar/XHTML) like hCalendar, but extend it with the rich UBL-inspired semantics of a Berkeley Event Model. Now that everyone agrees, let's move one. We need open source tools so event producers (even for my yard sale) can list their events on their blogs/web sites, like they already do, but also be sure to use valid markup. (Note: people are motiviated to promote their events -- no prodding necessary) Then we need an open source spider, a bunch of disks and some linux boxes, a REST API, open source recommendation and reputation/rating/tagging engines (eg., tweaked openscrobbler + ??/??/??), and some pub-sub and feed generators. And to top it off, let's make the database and service usable by all (eg., some kind of creative commons), so we can unleash the creativity of the world to build custom interfaces via our web services API. Easy stuff in these days of inspiration from Amazon/Google/PayPal/eBay/SForce, open source, wikis, bugzilla, and PayPal donate buttons. Right?

The law of ideas says that at least 6 (or is it 8?) people are either building or have already built this exact event service -- and I haven't found it yet. Any pointers? If not, anyone want to build this monster?

14 comments:

Bob Wyman said...: I've been wondering when people would begin to figure this one out! But, I think you may have missed a critical piece of the puzzle: The logical "wrapper" for the XML that will describe the events is RSS/Atom feeds. The reason is that we already have extensive infrastructure in place for discovering content in these feeds -- including "pinging" that allows publishers to immediately notify feed monitors of new data and thus avoid the terribly long delays that are inherent to the spidering systems used by systems like Google, Yahoo!, etc. Events are, by their nature, time-sensitive. We need low-latency mechanisms to publish them.
What we should be defining is an XML content type that can be the body of an Atom entry. If defined, we at PubSub would immediately do whatever is needed to provide solid support for subscribing to events. I'd be happy to provide whatever assistance is necessary to help this standard get created.
Of course, "events" aren't the only area in which we would all benefit from a transition away from the current systems that "rely on people coming and manually submitting events" to the future systems that will rely on "auto-discovery and aggregation via spidering". Job-postings, "offers-to-sell", "offers-to-buy", resumes, etc. would all benefit from the same approach. Once it becomes possible to openly publish these "structured objects" in the same way that we publish web pages, we'll see real competition among providers who will all be able to access the same data while offering interfaces each addresses a distinct set of market needs. We'll move from a web economy where service providers monopolize access to information to one where the data is freely available but people monetize the systems that provide services built on the data. This will be a good thing.
bob wyman
CTO, PubSub.com; 3:01 PM
Mud's Tests said...: Scott,
Thanks for your blog. I made some comments at the link under my name.
BTW, Hi Bob Wyman.; 4:20 PM
Thomas Winningham said...: Good stuff! I've always thought a key component to tie all of this together, and could be done now, is a workflow program like an aggregator with rules to publish out XML-RPC or just an RSS feed...
Thanks for the read!; 11:24 AM
Zach said...: Spidering / Scraping of events is complicated by the data that is loose in the wild. I won't say that it's impossible, but it's non-trivial.
Firstly, there is the problem of ambiguiously stated events even if they're completely specified. For example, the same date on US websites (or those with primarily US audiences) is represented differently than on a site in Europe Often mm/dd/yyyy in us is represented dd/mm/yyyy in Europe. While some of these can be disambiguated easily because a date format is invalid in a particular format, others can't be. Now, date formats don't usually switch on a single site, so this can be mitigated somewhat.
Secondly, many events are not fully specificed. Many events are posted with only partial meta-data. This is not by accident. This is by design: a weekly or monthly concert or talk series does not need to mention the start time for every occurance -- that would be redundant. And repetetive. Partial specification means that certain events would lack the very information that makes them events -- time of day information, or dates, or even event titles. This is a problem for a spider / scraper, since the scraper would have to have some template to scrape against. Any possible template would leave huge swaths of real and important (and maybe even good) events un-collected.
Third, but related to the second point, are events that are implicit. It's easy to imagine events being described in such a way as to be readily apparent to a human reader, but unstructured (and therefore hidden) to a parser. The event description and details could be spread among multiple pages, or even multiple sites. Without an AI-complete natural language parser, it'd be impossible to get at this event. Of course, having an AI-complete NLP system would pretty much negate this whole project (and a WHOLE LOT MORE - singularity here we come). These events may be more rare, but they're perfectly valid from the point of view of web authors.
All that being said, I'd love to see progress made on this front. I think that this would be a very nice web service -- discoverable events based on geographic and category filtering would roX0rs my soXXors.; 2:58 AM
Brian said...: Hi Scott. Glad to hear more people are interested in this stuff. FYI I sent email to your gmail account. Hope to hear back from you!
- Brian @ EVDB; 7:38 AM
Marc Canter said...: dude
you rock.; 2:10 PM
Arnaud Descamps said...: Dear Scott, thanks for your writing, we enjoyed reading your vision.
I am a member of EventsML working group.
EventsML is an initiative by the news industry to make a standard helping hassle-free exchange of event information. We focus on "news-worthy" events, but that might be
interesting to others. www.eventsML.org; 10:16 PM
Hanan Cohen said...: I discussed the exact same idea with some friends thinking we are going to "save the web" but then one of them saw this post of yours and realized we are no the only ones saying it should be done. Needed to be done.
Two things should happen. A standard should be aggreed upon, and one big service that has a big user base should add this feature and expose the data to the world with the standard. Then, everyone will get the benefit and build more information and services.
Take a look at Who What When Where.xml that I just satrted.
http://www.info.org.il/WhoWhatWhenWhere/
Let's work together.
Hanan; 2:21 PM
Jonathan Moore said...: We, mosuki.com, are heading in this direction. Our sight dose not have all the open interfaces you whant but that is one of the features we are working on. There is a little expermental iCalander stuff right now, each event can be individualy downloaded in the iCalander format. We will also add iCalander suport for your hole calander, group calanders, public web faceing urls (no loging required) and a restfull inteaface in the very near future (next one to three months). The sight as a hole is not considered done and there are some preformance and scaling kinks to be worked out. Give us a try where we aren't ready for the world your fead back now can help us shape what the app will become.
-Jonathan
Mosuki; 4:53 AM
Steve Shu said...: Cool idea. A very creative, technical innovation where the value strikes close to home. As a person that does business development and networking both online and offline, this kind of service would be good for me. There are times when I have probably spent 10-12 days per year searching for events, trying to sort between events, scheduling meetings with people, etc. Although the idea seems tricky to pull-off (in terms of deployment, ease-of-use, right feature set) perhaps there are some niches and angles to capitalize on your concept.; 6:27 AM
mediaeater said...: As an email.blog.sms events guide
(human filter best of the best)
whats the ideal metadata to adopt ?
Is there an event metadata standard.
You are right on btw love this post
mark; 4:12 AM
Tim said...: I just found this link..
I too am interested in events.. another good use case is for people who like the music scene - venues and clubs can post events on their sites, or musicians can post events such as touring schedules, and all the fans can find it and add it to their personal calendaring tools, whatever they are.
Also re: XML and calendar - I've written an IETF draf on this: http://www.ietf.org/internet-drafts/draft-hare-xcalendar-02.txt
which may or may not help.; 3:49 AM
tdv said...: I recently came across this page... great discussion here. With all the competing non-ratified-standards (CalDAV, EventsML, ESS, etc.) it's hard to know which direction to turn.
I'm working on Crosswise (http://xwise.org) -- open source software that analyzes calendar events via data analysis/visualization, in order to help community organizations plan events better. For the publishing modules, I've got to figure out which standard to go with.
A second, nontrivial issue is categorization and keywords; there are no universal classifications of audience and activity types. I need those for some of my analysis, so we're coming up with our own categories... jolly fun, for sure.; 2:53 AM
online dating services said...: Nice blog on Google and Internet. Very much informative.; 12:44 PM