The Intentional Web

Thursday, February 26, 2009

The majority of the time one browses or searches the web there is a goal in mind. Find the location of a coffee shop, learn more about cloud computing, see if there are any interesting new movies, be distracted and procrastinate. Each of these instances of web use has objectives and implicitly defines a success predicate. When one (the agent) interacts with the web, a computer or simply information (the system), that systems knowledge or discovery of an explicit representation of the agent's objectives, and the success predicates for these objectives, greatly enhances its capability to assist the agent in accomplishing its objectives.

Suppose I enter the query, "lee's art supplies nyc," into a search engine. A currently standard keyword search (PageRank, HITS, etc.) returns a list of results containing the keywords sorted based on page popularity. A semantic web search might parse out "lee's" as a title, "art supplies" as an object, and "nyc" as a location. Using this meaning based information the search could return the intersection of sets where each component appears as the correct semantic type and order by some bias between nearest neighbors based relevance and popularity. We are then returned a list of pages with things related to the title "lee's" and objects related to "art supplies" and locations related to "nyc". Neither of the above methods takes into account the user's intent in conducting their search.

There are a number of plausible intents one could have when entering the above example query. Maybe I want to find the location of..., learn the history of..., list the competitors of..., etc. Whatever the objective is, it is opaque to the query string. The intent is not calculated, it is not known.

Consider the same example but suppose that one can enter an objective along with their query (or that the engine can determine a probability distribution amongst objectives from the query and subsequent user actions). Assume my objective is something along the lines of, learn the history of.... It needn't be exactly determined by the algorithm as it likely isn't exactly determined or know by the user. Abstractly, we suppose that for each user query there exist a set of objectives and a distribution of importance over the accomplishment of these objectives. Given this set of objectives an information gathering agent is constructed. The agent's objective is to have the user accomplish the user's objectives. In the current example the agent's objectives may become something like user learn... which is then divided into self learn... and self teach user.... Each of these task is then further deconstructed into its simpler components as necessary and all tasks are completed bottom up.

The above characteristic of the intentional web is its ability to ascribe goals to users and conduct itself in a manner appropriate given those ascribed goals. Arguably, this is a change in the web's perception of user actions and not essentially a change in the web itself. A change in the web itself comes when we stop viewing the web and its components as just documents and data; as just network components (graphs) and purveyors of meaning (semantic units).

Each sub-graph of the web is seen as an agent capable of transitively effecting every connected component and itself in a non-well-founded manner. If there exists a broad objective that will increase the fitness of a large number of individual agents, the web agent community will transiently organize into larger units if necessary to accomplish this objective. The web becomes a community of organisms interacting with each other to accomplish their goals and increase their fitness.

As an example, consider the agent: financial news gatherer, whose objective it is to maintain connections to all pieces of financial news available throughout the web. The graph (group of sub-agents) that makes up this agent may contain the link gatherer, the financial news identifier (which itself may be a connection between the news identifier and the finance identifier), etc. If any of the sub-agents of the financial news gatherer increases their fitness, the financial news gatherer increases its fitness as well. What's more, when any agent whatsoever is improved, any other agent that contains the first agent will (generally) be improved as a welcome side-effect. Suppose the link gatherer is improved, any application that gathers links will improve with no required change to its own structure.

Decomposing specialties and organizing programs as links between specialized components is essential to modern programming and described by the concept of design patterns. Web programming has been moving in this direction with the popularity of model-view-controller (MVC) frameworks and plug-in architectures. These are positive movements but only transform and increase efficiency on a site level basis, not a network or community basis.

The trend towards application programming interfaces (APIs) and web services is significantly more relevant to the development of the intentional web. As an example, I've recently created a data aggregation web service, called Pairwise, and another service that pulls photos from Flickr, runs them through Pairwise and presents them to the user. We can think of Pairwise as a sort function, sort, and Flickr as a store of data, data. With these components built and maintained all a user must do to sort photos is the rather intuitive: sort(data). This is obviously still a ways away from what I've described above, yet it incorporates the essential component of specialized functionality.

Web mash-up building services, such as Yahoo! Pipes, provide an interface for users to create arbitrary combinations of data from throughout the web. With Pipes, people use a simple web interface to combine icons representing various programming functions. For example, one can drag a fetch data icon to a reverse icon, one can also embed "pipes" within each other. Pipes is a good example of integrating an abstract programming interface with internet data. But to add functions beyond what Pipes offers once must build an external web service and call it from within Pipes, which can perhaps be a nuisance if there is a large amount of additional functionality needed. Another problem with Pipes, as well as all mash-up builders, is related to API interoperability.

A current significant problem in moving towards an internet of connected services revolves around standards. To retrieve photos from Flickr, or use any other web service, learn the application interface and write (or use preexisting) code that operates within the domain of the application interface (in many cases one must also signup for an account with the service). There are no standards (or no standards commonly used throughout the web) that one can follow when writing an API and that one can refer to when interfacing with an arbitrary API, APIs are standard on a per application basis. In operating systems development it was quickly realized that not having specifications for APIs (a standardized API for APIs or meta-API) was extremely inefficient and a set of standards know as POSIX was developed in response. We need a set of POSIX-like open standards for the internet. For APIs to be completely not-interoperable until specialized code is written on a per-API basis is highly unproductive.

Without API standards the development of an intentional web, in the sense of a network that organizes information based on the determined goals of the other members of that network, is still possible. The difficulty would be that the programs which mine goals (intention) from web information would either need to be primarily self-contained or link together other services that use proprietary APIs themselves. Because large-scale adoption of new technologies by internet developers can be slow and is normally not done without justifiable cause, an effective approach may be to build an intentional web by linking other services together but linking them through an API-interpreter that converts arbitrary APIs into an open standard. Those who wish to use the intentional features of the web or any web service can use the open standards. Those who wish to develop services integrated into the intentional web, or accessible by other services using the open standards, can either write their API in conformance with the standards or write an interpreter that translates their API into the open standards. Additionally, anyone who wants a conforming version of an existing API can write an interpreter and thereby make this API available to all users. Preferably, there would be a discovery agent that builds an interpretation of arbitrary APIs into open standards as well as documents APIs in open standards format and in their original format. After processing an API, the discovery agent would monitor the API for changes and update the interpretation and documentation as necessary to maintain current functionality and add newly introduced functionality.

There are certainly more hurdles before a complete intentional web is developed but, without either open standards or automated API interpreters, only hubs of services capable of communicating with each other will develop. These will likely be controlled by Google, Microsoft, Amazon, and other big industry companies. With systems made interoperable based on open standards we would be able to connect one service to another without hassle. As all standards must, meta-API standards must have flexibility and extensibility built in as a core component. Ideally, these standards would emerge naturally from the existing APIs on the web and gracefully bend to fit changes occurring in the web over time. An eventual change in the web will be a movement beyond an intentional web. Systems should be developed to accommodate radical alteration of the fundamental structure that defines them.