Several recent conversations on P2P websites and mailing lists have left me with the impression that there’s a lot of confusion and disagreement about the meanings of various terms common in discussions of system architectures. In particular, the following terms seem to be the source of much contention:
- Peer to Peer
Intuitive definitions for these concepts abound…and often differ. Most people’s “mental maps” of this conceptual space represent nothing so much as a map of the Balkans or some other similarly warred-over territory. Thus, my intent is not so much to preserve intuitive meanings as to provide meanings that define a consistent and regular “coordinate space” that aids people in concisely describing various architectures. The result is therefore guaranteed to seem “counterintuitive” at first, but I believe that improved descriptive value will make up for that.
The sorts of systems we’re talking about generally exist to provide access to some kind of resources – files, blocks, streams, channels, objects, etc. Centralization is all about who controls resources. If you need to access a resource, who grants or denies that access? Who enforces rules for synchronization or serialization of accesses? If the resource is data, where is the authoritative version from which others are copied? The answers to these sorts of questions define the system’s level of centralization:
- In a fully centralized system, all control of all resources resides in a single server node.
- In a weakly decentralized system, control might be partitioned between a small set of servers, according to topological location, location within an abstract namespace, type of access, etc. Many forms of replication or clustering between nodes that, as a set, provide service to a much larger set of nodes might also count as weakly decentralized.
- A strongly decentralized system is one in which control exists in many places (“many” being relative to the total number of nodes in the system). Most often, control over a resource resides with the node that initially provided the resource – e.g. the node that inserted a file – though other schemes are possible.
Decentralization has both static and dynamic forms, depending on the “fluidity” of ownership. If ownership of a particular resource is assigned once when the object is created/introduced and never changes thereafter, that’s static decentralization. At the other extreme, a highly dynamic form of decentralization might involve ownership “moving toward” the most frequent accessors.
Distribution of resources is a matter of their presence or location rather than control. How many copies of (or handles to) a resource might exist throughout the system at one time, and how do those relate to the original resource on its home node? For data, this obviously refers to caching, but in a more general sense it can refer to any kind of delegation. Again, the distinctions are best illustrated by examples:
- In a non-distributed system, no delegation occurs at all. All access to a resource directly involves the resource’s home node.
- In a weakly distributed system, some limited form of delegation is allowed. For example, a client might be allowed to cache data but only on its own behalf; caching at proxies or other intermediaries might be expressly forbidden. Alternatively, delegation might only occur for very specific types of operations, for limited durations, or under particular circumstances.
- Strongly distributed systems allow very high levels of delegation. Data, for example, might be cached everywhere, at all levels of the system, with many nodes simultaneously accessing it without any home node’s direct knowledge.
As with decentralization, distribution occurs in both static and dynamic forms. In this case the issue is one of topology. If the network topology is fixed and delegation only occurs along some spanning tree – as in a multi-tiered client/server system – that would be a very static form of distribution. A dynamically distributed system, by contrast, would allow delegation along all paths in the network, and might even allow the network to be reconfigured to bring requestors and providers into greater topological proximity.
Peer to Peer
With all this in mind, I should be able to say that I’m working on a system that’s “statically, weakly decentralized” and “dynamically, strongly distributed” and have the statement make some sense. But how does P2P fit into this context? Let’s consider our two criteria in turn:
- Peers are equals, neither set above the other. Therefore, a P2P system must be strongly decentralized. However, the decentralization might be either dynamic or static, with static decentralization actually seeming to be the most common choice.
- With regard to distribution, there’s a little surprise in store. “Peer to Peer” implies a direct connection between a provider and a consumer of a resource, which precludes proxy/intermediary relationships and thus implies in turn that the system is weakly distributed at best. Again, the distribution may be either dynamic or static, and in this case the dynamic alternative seems most popular.
According to this set of criteria, at least one well-known “P2P” system that relies heavily on proxies and caches is not truly P2P. That is certainly not meant as a negative comment; such systems – including my own – might not be P2P, but they’re something else just as paradigm-shattering as P2P. The point here is that P2P is not the same as “ultimately decentralized, ultimately distributed, ultimately cool”. There are other types of systems that go beyond P2P in some ways even if nobody has come up with a catchy acronym to describe them yet. IMO the day can’t come soon enough when people stop thinking fuzzily about P2P and start thinking clearly about levels/types of decentralization and distribution instead.