The following are things I specifically want in the Querki architecture:
- High Concept: Querki will take the core idea of ProWiki -- the notion that pages don't just have metadata, but are composed of queryable properties -- and carry it to the next level. Basically, I want to treat ProWiki as having been a prototype, and now move onto a system that is worth trying to promulgate widely.
- Database-driven: The key idea here is the fact that properties cut across pages, and are highly queryable. I think I've pushed that as far as we can practically do with flat files in ProWiki -- it's a nice architecture, but fundamentally doesn't scale well. It's time to move to a MySQL? basis, so that we can get better power with reasonable efficiency.
- Decent space efficiency: A complication of doing this with a database is that we need to pay very close attention to the space equation. We may still want to be storing diffs inside the database if possible, with only the current tip stored as fulltext, but we'll see about that. We should also be careful about the use of indexes -- we don't want to be indexing fulltext values. (I suspect that we will need to distinguish between indexable and non-indexable kinds of properties. Enums and links get indexed; fulltext doesn't.)
- Better UI: A fatal flaw of ProWiki is that it just isn't very pretty. I think it would speed adoption if I instead layer on top of a wiki technology that has a better look-and-feel.
- Better query language: The ProWiki query language is a bit ad hoc. Think it through more carefully, and make sure that we have something that is semantically rich enough. This may well imply that we need something that isn't simply a regexp -- we might need a real parser. The implementation of this language must guard against both SQL attacks and DOS attacks; specifically, it should guard against dangerous recursion, and should cap result sizes.
- Easy to Use: ProWiki, being based on UseMod?, is rather bare-bones. Querki should have a better UI as a specific goal. In particular, it should have a better property editor (probably with a field-based UI so that you can't screw up the syntax) and a relatively WYSIWYG query editor. The query editor will probably be built on top of a true query language, but should transform those queries into a more comprehensible form. Ideally, it will have a good WYSIWYG editor for properties; it's conceivable that it might even produce and use richtext output, but I'm pretty suspicious of this.
- Hidden Power: I call this out specifically because "Power" is in tension with "Easy to Use". In general, we should follow the model of having everything default sensibly, so that if you use the system naively, it just works. All the power features should be easily available by pulling back layers, but not in the face of the average user.
- Deep namespaces: It should be possible to define wholly separated namespaces, that are opaque to each other. That is, I should be able to run all of my wikis on a single DB and a single source process.
- Good security model: Ideally, Querki should have a true fine-grained security model, where each page and each property can be specified as having specific ACLs. More plausible, this should happen on a namespace basis, where a user either has or doesn't have edit access to that namespace. It should also have at least three distinct security levels: read-only; page/property edit ability; and full administrative access. Admin access might be namespace-by-namespace, or across the board. (The most important admin access capability would be the ability to create new namespaces, as well as to boot users.) Indeed, an even stricter security level might be desireable: no access, meaning that no one can even view this page unless they have the appropriate access rights.
- True users: In order to support the security model, it should have a first-class notion of what a user is. It should also have a true concept of user sessions, and should probably support both browser-session and permanent login cookies.
- Arbitrary Text Links: Wikis that only permit pages whose names are camelcased bug the hell out of me. Querki must have double-bracket syntax, and should probably permit any text inside of there. If the page names are going to be pure DB fields, there is no reason not to allow arbitrary page names. Indeed, the only reason I see to keep the autolinking of camelcase is that so many people expect it of a wiki. We should make it a switch that can be turned off -- I personally just find it annoying, and rarely use it when I don't have to.
- Image/Data? Import: A weakness of ProWiki, FlexWiki? and a lot of other simple wikis is that they make it much too hard to import images and other data files into the wiki. This must be much easier. It should be nearly trivial to add an image to a page, and it should be easy to import and manage data files that are referenced in the pages. (The lack of this is one of the biggest nuisances of using FlexWiki? at work, and I would use it in my LARP work if I had it.)
- Arbitrarily Definable Page Rendering: We should have any number of different modes for rendering pages. We should have the display template idea as in ProWiki, where pages render based on their inheritance, but we need to go further than that. In particular, we need to be able to support multiple rendering "modes". We should be able to render a page chromeless, for printing. We should even be able to export a query to Word or PDF, for really good display, including sensible page breaks. It is probably appropriate that the underlying technology for this be implemented at the native level, but the page level should be able to define the details of the look and feel of a particular class within a given mode.
- Pluggable: More a nice-to-have goal than a concrete requirement is making the system as pluggable and extensible as possible, a la Twiki. This is best accomplished by sticking to rigorous APIs throughout, so that they can be dropped in and replaced as necessary. It would be particularly useful to be able to plug in different browser languages and different UI modules. A good stretch goal would be rendering differently depending on the browser, ranging from full-blown AJAX for IE and Firefox down to super-simplified for cellphone access.
- Applications: Ideally, one should be able to build reasonably powerful apps on top of Querki easily. This is only a medium-priority goal, but possibly one worth getting right upfront. I'm thinking that "eating our own dogfood" may be an excellent way to test the system. For example, if the dev process is story-based, we might build a simple story-management system on top of Querki as soon as possible, and use that for development. (And bug-tracking, etc.) The sky is the limit here -- the only question is how complex an app we want to allow/encourage. We'll learn a lot about what makes sense in the Querki environment by using it.
- RDF Output: Quite possibly the killer app for Querki, the thing that ordinary wikis simply can't do, is output the data in a structured way. It should be possible to define a namespace or a part of a namespace in a way that exports well to RDF, and to make the system queryable for RDF data. There are some interesting implications here: I believe we'll need a way to map from our prototype-based schemas to the more structure RDF schemas. I suspect that the right way to handle this is with mapping objects -- the RDF query would talk to a mapping object that prepares a query into the property database. But this needs further thought and design.
- RSS Output: A feature common to most modern wikis, which should be kept in mind, is the ability to export the Recent Changes list as an RSS feed. Enough people find that useful that it should be counted as a requirement, albeit a lower-priority one. Ideally, it should show precise diffs -- not just that the page has changed, but what has changed. Even more ideally, one might be able to subscribe to changes by specific users: that would make co-ordination among LARP teams really easy.
- Comment threads: We should have a more semantically-rich concept of comment threads than the usual "add text to the bottom of the page" approach usually taken. In particular, we want to be able to display the page with and without comments, as desired, and would like to have a first-class notion of the threading of the comments. We need to be able to add comment threads "in-line" -- that is, you should be able to comment explicitly on a specific aspect of the page, not just on the page as a whole, and it should display in some sort of sensible way. (Think of Word's comment mechanism as one possible approach.) This demands a really nice UI for comment creation and editing. Also, keep in mind that we need the comments to remain hooked to the specific position in the text, even after the text is edited; this implies that a token gets stuck in to indicate where the comment hooks. Comments should become mostly-read-only once posted, but it should be possible to work around that and edit the comment later if desired. (An observation from LJ is that you only occasionally want to edit a comment, but sometimes you really wish you could.) More on this subject in Querki Comment Threads.
- Ad Hoc Queries: Not a hard requirement, but a nice-to-have. Given that we've got this powerful system, and a good query language, the obvious next step is allowing true ad hoc queries. Provide some sort of interface that allows the user to build a query (using the same query editor that is built into the edit window) as a one-shot. This doesn't get stored in the database -- it just gets rendered immediately and then discarded. Think about this from a workflow POV: a good common workflow would be to iterate an ad hoc query until you like it, and then create an object out of it if you think it might be useful. Make that workflow as easy as possible.
- Good CSS Control: It should be reasonably straightforward to apply a CSS page to at least a namespace, possibly even a page. vortexofchaos points out that this is how he does his LARP writing these days, with CSS giving it an appropriate look-and-feel.
- Extractable Pages: It should be possible to bundle up a namespace and move it to another site. More ambitious, and more useful -- it should be possible to extract the schema from a namespace, and move just that to another site. That would allow me to reuse my LARP Schema.
- Shared Pages: Even better -- it should be possible for a set of namespaces to share a common schema. I'd like to have a single LARP Schema, and reuse that in many namespaces that represent individual games.
- Easy to Install: If this thing's really going to take off, it has to be easier to set up than systems like TWiki. We'll need to put serious work into making a good installer. Moreover, this is needed early, so as to draw people into the project. Think hard about how to get developers up and running and involved quickly and easily -- make sure there aren't stupid barriers to entry into the project. (Thanks to Darker for this point.)
- Hard to attack: An eternal battle, but one to begin planning for early. What will the attack vectors be? It should be designed to be wikispam-resistant out of the box. How does it deal with DDoS?? If we're using a language like Ruby, we need to be very conscious of code-injection attacks, and we need to be absolutely paranoid about SQL-injection attacks. (See [this page] for lots of ways attacks are constructed.) Sad though it is to say, we need to make the same assumptions here as we would for any other enterprise-grade code: every user is a potential enemy, and we must treat them with a decent measure of suspicion. Go through the [OWASP Site], and especially their [Top Ten], and make sure we aren't showing any of the specified vulnerabilities.
- Spam Resistant: What's involved in this? ACLs go a long way, but what would help in a public scenario? Banning IP addresses? Being able to automatically roll back all changes made by a particular user or IP address? What else?
- HTML and Javascript Sanitization: What would help with this? On the one hand, we want to allow a lot of power. On the other, we need to protect against subversion by malicious hackers who would put traps on pages. Think about this one carefully. Obviously, it's much more important in public-editable wikis than private ones, so it's a second-order requirement. But the UseMod? concept of allowing only a specific list of tags is a very sensible one.
- Distributable: A long-term objective, but one to make sure we don't prevent -- it should be possible to run the system with multiple front-end servers talking to the back-end DB. The implication is that the front end should have no critical state. Even in-memory caching should be viewed with great suspicion.
- Version Diffing: A minor but useful feature to consider. It's essential to be able to view previous revisions of a page -- ever wiki can do that. But it would be even better to be able to see exactly what changed between versions. When I'm working with someone else on a wiki, it can be challenging to see exactly what they did to a page, to keep track of it and respond. So being able to easily take two revisions (most often, the tip and the previous revision) and see the changes would be really nice. And if we want merge capability for other reasons, it probably isn't too hard.
- Spellchecking: While I personally rarely use spell-checkers (I find them annoying), many people really, really like them. So we should be prepared to at least have a spell-check plugin eventually, preferably with dictionaries that can be installed at the system level and chosen by the user. Think about whether specific pages get associated with specific dictionaries, or if it's purely decided based on the user.
- High-performance: It's worth spelling out specifically -- this thing needs to be fast. This could make a major difference in the choice of system. Rumor has it that Rails is strictly single-threaded; if so, that could count it right out. Is this a Ruby limitation, or just Rails? If Ruby doesn't work, we'll have to consider something else, maybe even something esoteric like [Scala] and [lift].
- CRUD Handlers: The Rolls Ethereal are an interesting use case. Among other things, they indicate that we probably will need workflow-like capabilities, which means the ability for an application to intercept Create/Update?/Delete? operations and do appropriate stuff. So for instance, the Rolls application would detect that an entry has been created, send out confirmation email, and hold the entry until it gets confirmed. It might even expire the entry if it doesn't get confirmed in reasonable time. This probably happens at the platform-language level, not in the Querki language itself, but that isn't fully clear.
Back to Querki Design Notes.