From: Eivind Eklund To: rubygems-developers@rubyforge.org Cc: Bcc: Subject: Re: [Rubygems-developers] Suggestions: categories and querying Reply-To: In-Reply-To: On Fri, Sep 17, 2004 at 09:30:30AM -0400, Chad Fowler wrote: > > On Sep 17, 2004, at 6:01 AM, Eivind Eklund wrote: > > >There are two places that could do this well at the moment: RAA (if > >somebody adopted doing the librarian work for it), and RPA (which has > >Mauricio as it's librarian already). I think RubyGems' best bet is to > >NOT add categorization at all at this time, but instead cooperate > >closely with one of the above, and help them generate really good > >categorization, and when good categories are available, start helping > >authors find categories for their software. > > > >Anything else is doomed to chaos and a false sense of being helpful. > > Thanks for the long and obviously well thought out response, Eivind. I > can't say I completely agree with you, but I _do_ agree that RubyGems > should not add any kind of categorization right now (or possibly ever). > I also believe that rpa-base should not add categorization. I think > it's in the scope of something at the RPA level, but should be > completely left out of the _packages_ themselves. I agree with keeping them out of the packages. They're at a level higher up. > I would be open to adding keywords to gems, but I would want to think > it through a lot more. Keywords may be single-level hierarchies, but > being single-level (and therefore not _really_ hierarchies), they don't > carry with them the same commitment to a structure that may or may not > be right. They can be used to help someone find a library or > application without forcing a rigid classification system. I'm not sure they can be used to help people find something. I'm afraid that people will THINK that they can be used to help find something, and therefore will add them and avoid thinking about the hard problems associated with getting a good solution. > Finally, I'm not convinced that a hierarchy is the way to go at all. I > would even go so far as to say that hierarchical classification for > this kind of computer-based purpose is obsolete. This does not match my experience. I find the organization of a physical library much better than computer based searches based on keywords. It is just expensive to maintain. > And, as you've pointed out, they are almost unusable for > self-organizing system/communitiies. Again, I respectfully disagree. In my opinion, they're expensive to maintain and give a high payoff. As I said: I think the lack of them for software is possible THE primary flaw of software development today. I hope you'll allow me another mini-essay - you're striking a lot of issues close to my heart with the areas you tackle, so I've got a lot to say :-) All of human activity - really, all of life - is a self-organizing system. The activity of the human part of this system is based on perceived expense and what benefits the individual get from it. In larger contexts, the organization comes from the activity of a number of individuals. This activity is directed by the interaction between the individuals and the world, including each other. Suffering from abstraction asyphication yet? Thought so - I'll try to get a little more down to the nitty-gritty. Then we'll go up again and look at the forces included, how these real-world examples make things work, and try to construct an example of how this could work for a RubyGems library (or RPA). Remember: It's always an interaction between a culture and a technology - because the culture shape the technology, and the technology shape the culture. Two examples of fairly self-organizing hierarchical taxonomies, made using network technology and a self-replicating culture: Wikipedia and the Open Directory Project. The latter has constructed over 460,000 categories and categorized many millions of sites by volunteer feedback; the former has, in just a few years, built the largest encyclopeida in the history of mankind, where hierarchical *and* crosscutting organization is visible all over the place. One thing that is clearly visible in both these projects is that they have a strong interaction between technology and culture, and that the technology has been designed with the explict goal of shaping the culture - of making some behaviour rewarding, and other behaviour non-rewarding. And they've both made tools that make *collaboration* work nicely - not having every user "sit on his own hilltop, use the tools, and spread his data to the world", but letting users that want to help fix up where things can be improved do so easily. They also foster a sense of "doing something for the world" by doing such fixups, and the ability to do a group of such fixups at the same time, getting into a state of fixing, fixing, fixing - wow - the world is noticably better than it was just ten minutes ago! This is also something that has been there since the inception of both projects. They've tried to keep things good all the way, and have built their infrastructure for it. The clearly most successful of them (Wikipedia) has also built the infrastructure to foster a sense of community, and to make it possible for the members to communicate among themselves about the work. The infrastructure (at least for Wikipedia) is also made so that while it is extremely easy to do damage, it is also very easy to fix up, and the community can keep track of that and fix it as necessary. I think it is possible to make the same happen with RubyGems and RPA. We just need to make the infrastructure that makes it EASY for people to help, and make it non-rewarding to damage the dataset. Wikipedia does this by making it easy for people to see what changes happen, and keeping history so it is easy to revert vandalism. So: Vandalism really make little difference, and disappear quickly. You also need to motivate people to contribute. There are a few different aspects to this. First of all, it is making the right things easy and the wrong things harder. This is done in Wikipedia etc by the ease of entering things and the number of ways people can help fix, but I think this property will miss from any system where every free software author assign the categories (or keywords, or whatever you call them) to his software locally. (I'll describe a system that I think would actually work for RubyGems below.) Second, it is increasing the reward for doing the right thing. It shouldn't just be easy to do the right thing - it should feel good. One of the ways to do this is to do a so-called "step up" to a larger goal that the person feels more about. For Wikipedia, the step up goal is to "Spread education to everyone". Another way to give people positive feedback on good behaviour. An example of this is Ward's signature and his "Thanks for your careful attention to detail!" on the c2.com Wiki. (This also slot the submitters into a role when they submit stuff - a very effective technique for manipulation, as people don't want to let that positive role down.) Third, make doing stuff into a habit - because people then do it a lot and get good at it. Wiki, Wikipedia, and the Open Directory all do this - because people can work on more than just their own stuff. Now, putting all of this together into a working design for how to get RubyGems properly categorized: * Set up a collection site for RubyGems. You want people to upload their gems there, so all gems are available in a central location for categorization. (They don't have to be available for general download, but they must be available for inspection for labellers). * Make an interface where the authors are profusely thanked, and told about how this helps the entire Ruby community, and this hopefully will make all of the world a better place. Also indicate where the author can help categorize his own and other packages. * Make each category include a description of what should go into the category, in addition to the category name, and extra keywords that the category should also show up for. * Make the category assignment system so that you FIRST search for categories by keywords (+ to enforce a keyword, normally OR the keywords to make sure that people get ALL the possibly relevant categories). AFTER you have searched, you can choose "Add Category" at the BOTTOM of the form. And there is a new search form, with your entered keywords, just above the place where you press to add a category. * Only allow adding categories from the next higher level in the hierarchy, where you'll see all the already existing subcategories. * Make the "Add Category" go through a separate confirm page before getting to the information entering page for categories. On this page, explain how important using the existing categories is, and that adding a new category is a fairly big deal - but also the right thing to do if it is the right thing to do. Add a search box with the keywords here, too. And say "Thank you for your attention to detail. This categorization system is made to help Ruby users find the software they need, and by maintaining its quality, you make the world better for everyone (and hopefully help make Ruby a viable language for all your own use, too, by getting more people to help.)" Or something like that. * Make the category addition page require a list of search keywords that can match the category (any that are not in the name of the category already), and a large box with "Description". Disallow adding categories with too short a description. * When the user is through adding a category, allow him to search for other software that should ALSO be added to the category, to make sure that the category is good. * Have a separate page with Recent Changes, which include lists of new categories and what software packages have been added to what categories. This allows separate review. * Make any look at a package see various levels of detail of the package, including inspecting source code and change frequency, in order to determine how to categorize the package. * Make it easy to merge categories, and to remove (and restore, with contents) categories. I think the above (along with a manifesto describing how important categorizing is) should make distributed volunteers create good categorization. Eivind.