unofficial planet python

May 11, 2008

Sean McGrath

Simon Willison's Weblog

Byteflow Blog Engine

Byteflow Blog Engine. This looks like the most full-featured of the Django blog engines by a pretty big margin, including OpenID client and server support. A product of the growing Russian/Ukrainian Django community.

May 11, 2008 07:41 PM

Doug Hellmann

PyMOTW: heapq

The heapq implements a min-heap sort algorithm suitable for use with Python's lists.

Module: heapq
Purpose: In-place heap sort algorithm
Python Version: New in 2.3 with additions in 2.5

Description:

A heap is a tree-like data structure where the child nodes have a sort-order relationship with the parents. Binary heaps can be represented using a list or array organized so that the children of element N are at positions 2*N+1 and 2*N+2 (for zero-based indexes). This feature makes it possible to rearrange heaps in place, so it is not necessary to reallocate as much memory when adding or removing items.

A max-heap ensures that the parent is larger than or equal to both of its children. A min-heap requires that the parent be less than or equal to its children. Python's heapq module implements a min-heap.

Creating a Heap:

There are 2 basic ways to create a heap, heappush() and heapify().

Using heappush(), the heap sort order of the elements is maintained as new items are added from a data source.

import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data

heap = []
print 'random :', data
print

for n in data:
print 'add %3d:' % n
heapq.heappush(heap, n)
show_tree(heap)



$ python heapq_heappush.py
random : [19, 9, 4, 10, 11, 8, 2]

add 19:

19
------------------------------------

add 9:

9
19
------------------------------------

add 4:

4
19 9
------------------------------------

add 10:

4
10 9
19
------------------------------------

add 11:

4
10 9
19 11
------------------------------------

add 8:

4
10 8
19 11 9
------------------------------------

add 2:

2
10 4
19 11 9 8
------------------------------------



If the data is already in memory, it is more efficient to use heapify() to rearrange the items of the list in place.

import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data

print 'random :', data
heapq.heapify(data)
print 'heapified :'
show_tree(data)



$ python heapq_heapify.py
random : [19, 9, 4, 10, 11, 8, 2]
heapified :

2
9 4
10 11 8 19
------------------------------------



Accessing Contents of a Heap:

Once the heap is organized correctly, use heappop() to remove the element with the lowest value. In this example, adapted from the stdlib documentation, heapify() and heappop() are used to sort a list of numbers.

import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data

print 'random :', data
heapq.heapify(data)
print 'heapified :'
show_tree(data)
print

inorder = []
while data:
smallest = heapq.heappop(data)
print 'pop %3d:' % smallest
show_tree(data)
inorder.append(smallest)
print 'inorder :', inorder



$ python heapq_heappop.py
random : [19, 9, 4, 10, 11, 8, 2]
heapified :

2
9 4
10 11 8 19
------------------------------------


pop 2:

4
9 8
10 11 19
------------------------------------

pop 4:

8
9 19
10 11
------------------------------------

pop 8:

9
10 19
11
------------------------------------

pop 9:

10
11 19
------------------------------------

pop 10:

11
19
------------------------------------

pop 11:

19
------------------------------------

pop 19:

------------------------------------

inorder : [2, 4, 8, 9, 10, 11, 19]


To remove existing elements and replace them with new values in a single operation, use heapreplace().

import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data

heapq.heapify(data)
print 'start:'
show_tree(data)

for n in [0, 7, 13, 9, 5]:
smallest = heapq.heapreplace(data, n)
print 'replace %2d with %2d:' % (smallest, n)
show_tree(data)



This technique lets you maintain a fixed size heap, such as a queue of jobs ordered by priority.


$ python heapq_heapreplace.py
start:

2
9 4
10 11 8 19
------------------------------------

replace 2 with 0:

0
9 4
10 11 8 19
------------------------------------

replace 0 with 7:

4
9 7
10 11 8 19
------------------------------------

replace 4 with 13:

7
9 8
10 11 13 19
------------------------------------

replace 7 with 9:

8
9 9
10 11 13 19
------------------------------------

replace 8 with 5:

5
9 9
10 11 13 19
------------------------------------



Data Extremes:

heapq also includes 2 functions to examine an iterable to find a range of the largest or smallest values it contains. Using nlargest() and nsmallest() are really only efficient for relatively small values of n > 1, but can still come in handy in a few cases.

import heapq
from heapq_heapdata import data

print 'all :', data
print '3 largest :', heapq.nlargest(3, data)
print 'from sort :', list(reversed(sorted(data)[-3:]))
print '3 smallest:', heapq.nsmallest(3, data)
print 'from sort :', sorted(data)[:3]



$ python heapq_extremes.py
all : [19, 9, 4, 10, 11, 8, 2]
3 largest : [19, 11, 10]
from sort : [19, 11, 10]
3 smallest: [2, 4, 8]
from sort : [2, 4, 8]


References:

heapq Theory
WikiPedia - Heap Data Structure
Python Module of the Week Home
Download Sample Code


Technorati Tags:
,


by Doug Hellmann (noreply@blogger.com) at May 11, 2008 05:10 PM

Tales of a Programming Hobo - Christopher Armstrong

Intelligent Hinting

Aaron A. Reed recently announced an open beta for his Intelligent Hinting extension for Inform 7. This is an amazing extension that intelligently figures out how to solve puzzles in Inform 7-based games with high-level puzzle annotations in your I7 project.

You have to define "puzzles" and "tasks" in your own game, at implementation-time, and the extension provides a >SUGGEST command which indicates the next action to be taken to solve the current puzzle. It's surprisingly smart: if you've defined that a cloak must be placed on a particular hook, it will automatically figure out how to move the player to find the cloak, pick it up, and move the player to the hook. Not only that, it even knows how to completely automatically find keys for locked doors that are between the player and either the cloak or the hook.

Not only is this a good feature for end-users, it also offers very important benefits to implementors of IF: It makes it trivial to automatically test if your work is winnable, and it makes it similarly trivial to generate a walkthrough to publish with your game automatically.

Inform 7 has a rich and descriptive world model, and it's great to see tools that are starting to really take advantage of it in very useful ways.

by Christopher Armstrong (noreply@blogger.com) at May 11, 2008 04:36 PM

Arc Riley

subversion vs git

I stayed up late last night working to install a shared git repository for the Pyrex replacement project. First of all, copious thanks to johnw and Ilari on #git for their help on this.

I'm very much an enthusiast of Trac, it's an excellent engine for project websites and has all the essential tools, builtin and via plugins, that any project should need. Ok, so there's a few things a project could need that isn't already written, but new plugins are easy enough (it's all Python, after all).

As such, I like to leave user administration to Trac, and it maintains a .htusers and .access file local to each project for that purpose. Likewise, for SVN, we use WebDAV on Apache, whereas the ssh+svn method would require adding those users to the local system or PAM hacking.

Thus, the most direct path would seem to install GIT via WebDAV and run it from there. As Ilari pointed out, GIT's DAV has many problems over SVN, from lockups to a failed push sometimes corrupting the shared repository. I tried anyway, with hours of help from joshw, and at 5am I had to throw in the towel. I'm sure it's possible, but not with my current lack of GIT knowledge.

The other available solutions could work, but first I'd want to write some new Trac plugins for accepting developer ssh keys and using these on the backend to govern access or figure out the PAM configuration for .htusers to have restricted SSH access for GIT pushes.

There are certainly advantages to GIT, but server setup difficulty and added complications for Windows developers leads me to sticking with subversion servers with git-svn client-side for now.

by Arc Riley (noreply@blogger.com) at May 11, 2008 04:15 PM

Doug Hellmann

Working with IMAP and iCalendar

How can you access group calendar information if your Exchange-like mail and calendaring server does not provide iCalendar feeds, and you do not, or cannot, use Outlook? Use Python to extract the calendar data and generate your own feed, of course! This article discusses a surprisingly simple program to perform what seems like a complex series of operations: scanning IMAP folders, extracting iCalendar attachments, and merging the contained events together into a single calendar.

Read more

This article was originally published by Python Magazine in October of 2007.

by Doug Hellmann (noreply@blogger.com) at May 11, 2008 03:23 PM

James Tauber

Metrics Provide An Inner Product

Another post for the Poincaré Project.

We've already seen that a one-form is a linear function from a vector to a (for our purposes) real number. On a manifold, one-forms correspond to stack-type vectors being applied to arrow-type vectors by counting how many "stacks" the arrow passes through.

In the previous post Metrics As Mappings Between Arrows and Stacks, we saw that a metric is an extra bit of structure that describes how to map between arrow-type vectors and stack-type vectors.

So, in summary:

  • a metric tells you how to go from an arrow-type vector to a stack-type vector
  • a stack-type vector can be applied to another arrow-type vector to get a real number

These two facts can be combined to let you take two arrow-type vectors and get a real number out of them.

This has parallels with currying in functional programming.

Recall that if a function "add" takes two integers and returns an integer, it can be viewed as a function that takes one integer and returns a function that takes one integer and returns an integer.

add :: Int -> Int -> Int

Now, a one-form is a function that takes a vector and returns a real. In other words:

Vector -> Real

So it is easy to see that if you curry a real-valued function that takes two vectors you get:

Vector -> Vector -> Real

In other words, a function taking two vectors to a real is equivalent to a function from a vector to a one-form.

So if you have a metric that can convert between vectors and one-forms (or, in the context of a manifold, between arrows and stacks) then you also have a function from two vectors to a real.

Such a function is called an inner product or dot product. Often the notion of an inner product is defined first, before one-forms are introduced (if at all). In fact, some texts will define a metric to be an inner product. It is best for our purposes, though, to think of the metric's fundamental purpose as being converting between arrows and stacks (and back again) and the inner product as being an extra concept we get for free.

May 11, 2008 02:55 PM

Sean McGrath

Prime time desk/lap computing with Ubuntu

Stories like this one from David Megginson are becoming increasingly common.

Be in no doubt : Ubuntu is an extremely capable and robust system that can give Windows/OS-X systems a run for their money in most laptop/desktop environments.

I'm just back from XTech 2008 and looking around the rooms at the conference, I suspect straight up Windows laptops represented < 50% of the machinery on show. In years to come, I suspect the late Oozies will be remembered as an inflection point of considerable importance in terms of market sentiment in the end-user operating systems world.

I plugged my Ubuntu laptop into the overhead projector stuff and it worked. No fiddling with resolutions, no bios brain surgery. The Mac folks at the conference had similar success levels. I remember a time when Windows was the safest bet for device compatibility. No longer methinks.

by Sean (noreply@blogger.com) at May 11, 2008 07:55 AM

Paulo Nuin

Obtaining overrepresented motifs in DNA sequences, part 2

We move one on our search for overrepresented motifs in DNA sequences. I was preparing this entry when the comments to part 1 arrived. Because of that we will modify our previous code to include some suggestions and then we will change t a little bit to output the actual values we want. As Titus pointed out in his comment, when we merge the sequences we generate errors, because we artificially including motifs that are not supposed to be there. But in the end we will see that we are no after the actual number of times a motif appears in the sequence and what matters is the motif quorum.

The main focus of the part one was to show the decrease in code length from C++ to Python, introduce generator functions and yield and also show the nice permutation function. Turns out by using the approach of the previous entry there is a big drawback in code execution, which is around 60 times slower than C++. So we need to find another approach, and Mike showed us how (with some small modifications)

#!/usr/bin/env python

from collections import defaultdict
from random import choice
import sys
import fasta

seqs = fasta.get_seqs(open(sys.argv[1]).readlines())
length = int(sys.argv[2])

#for a missing key, the dict entry is initialized to zero
counts = defaultdict(int)

#count the length-element subsequences in merged_seq
for i in seqs:
	for n in range(len(i.sequence) - length):
		counts[i.sequence[n : n + length]] += 1

#counts.keys() will then return the nucleotide sequences
#that were actually in merged_seqs

#print out the sequences that occur more than once
for count in counts:
    if count &gt;= 2:
        print ''.join(count), counts[count]

This time the counting is done with a defaultdict initialized with integers (all zeroes), and instead of generating all possible combinations (which was cool and fast, but in the end made our script slower) a window with the input length is slided along the each sequence and the key of the default dictionary is the motif and the value is the actual count incremented as we determine each motif.

Checking this code performance using Linux’s time, we get 0.2 seconds, an improvement of 5 fold on the C++ code and 300 fold on our previous Python code (Thanks Mike and everyone else for the suggestions).

A couple of other differences here is that the output is not ordered and only the seen motifs are printed in the end. We will see next time how to get the quorums.

by Paulo Nuin at May 11, 2008 02:56 AM

May 10, 2008

Arc Riley

a radical redirection

We've been at a deadlock for a few weeks now, our previous coding pace has turned into a lot of contemplation and side discussions about the greater issues with the PySoy codebase.

I think we have a roadmap forward, or enough of one to post these in a blog entry:

First, PySoy 1.0 will not be released until after Python 3.0 this Fall, and it would be pointless to continue developing with 2.x in target. Python 3.0 also has a large number of useful changes for our project. Our target platform should therefore be 3.0 (and it's alpha/beta's for now).

Second, with the release of Pyrex 0.9.7 Greg has, once again, introduced a major language change (the "for" statement) without even soliciting input from the community. This has sealed the deal for us, even if our immediate issues are fixed we cannot continue to subject ourselves to a language that changes in incompatible ways between 2 micro versions (0.9.6 -> 0.9.8, when the old syntax will not be supported).

We earlier started a Cython variant to support PySoy's codebase, but given recent developments in their community, the timelines of our projects, and their choice to stick with Python 2.x for now, we're moving forward on a Pyrex replacement written from scratch and targeting Python 3.0.

We're going to use a similar language style to Pyrex, Greg certainly had some good ideas, and will keep language porting in mind, but there will be changes. We will also not be using any of his code including his Plex module, Pyrex and Cython's lexical analyzer which is the core of those packages.

PySoy Beta3, earlier aimed for release in "early Spring", has been indefinably postponed as we work on that. The subversion repository is of course open, and development can and will continue, but we cannot release until the new build system is complete.

by Arc Riley (noreply@blogger.com) at May 10, 2008 11:53 PM

Jesse Noller

Abby is walking!

In addition to the Python sprint day work I am doing (as well as pymag stuff) I've been editing the obligatory "zomg baby is walking" video, which is below.

It's funny - she's been close, and doing short spurts, but last night it was like her walking switch just "came on".

by jesse at May 10, 2008 06:51 PM

Ivan Krstic

James Tauber

Introducing Pinax

In the post Reusable Django Apps and Introducing Tabula Rasa I mentioned my project to create an out-of-the-box Django-based website with everything but the domain-specific functionality.

At the time I was calling it Tabula Rasa but now I've settled on the Greek word Pinax, proposed by Orestis Markou.

So far it's just my new django-email-confirmation app tied together with password change and reset, login/logout, with the beginnings of a tab-style UI. There's a ton more I want to refactor out of my existing websites to put into it as well as adding support for OpenID and the stuff I'm starting to do for django-friends.

Even if one doesn't use Pinax as the starting point of a website, I'm hoping it will prove very useful for another goal, namely a "host" project to develop and tryout reusable apps.

The initial code is available at http://code.google.com/p/django-hotclub/ under /trunk/projects/pinax and there is a running instance for you to try out at:

http://pinax.hotcluboffrance.com

May 10, 2008 03:22 PM

Spyced

IDE update

Last night the Utah Python User Group held an editor/IDE smackdown. I'm not going to write an exhaustive summary, but here are some highlights:
  • ViM's OmniComplete is actually pretty decent. Calltip support in the GUI is also good. (GUI? ViM? Yeah, weird.)
  • Emacs completion, from Rope, is also good. Emacs's refusal to make any concession to GUIs though keeps things clunky. Not that it isn't great that Everything Works over plain ssh; that's fine, but going through classic Emacs buffers for docstrings or completion means everything takes more keystrokes than it should while being less useful than having that information Always On.
  • Rope also gives Emacs refactoring support that works surprisingly well.
  • PyDev still sees a big win from the Eclipse platform. Specifically, even though Subclipse and Subversive are a bit weak compared to the gold standard (that would be TortoiseSVN), they are much better than what you get with Komodo or Wing. Now that I am on OS X (no Tortoise) this is a bigger issue for me than it used to be.
  • PyDev Extensions has refactoring support now, too.
  • Komodo has limited support for completion inside django templates. Which is impressive, since the commands allowed in django templates aren't really Python, which is to say that you can't just use the same completion support that you use for normal Python code.
  • Mako template support with completion, anyone?
  • The latest versions of Komodo and Wing both integrate unittest support. Wing also supports doctest out of the box. Meaning, you click a button, your tests run, you get a pretty summary with click-to-go-to-the-source-of-the-error support. This might get me to finally upgrade to Wing 3. It's not that "python test.py" is so hard, so much as I do it so often that even a little more convenience adds up.
I was surprised how well ViM and Emacs do with Python now. ViM's modern inline interface for code completion and Emacs's refactoring support are particularly nice. The IDEs still win on the I part (Integration), in particular debugging and (for Eclipse at least) svn support.

Update: Ryan McGuire blogged about his Emacs presentation in more detail.

by Jonathan Ellis (noreply@blogger.com) at May 10, 2008 10:22 AM

Paulo Nuin

Obtaining overrepresented motifs in DNA sequences, part I

Changing gears now, leaving behind Pfam alignments. I decided to start a new series of posts based on the conversion of some small C++ programs I developed in the past. These small programs (I call them modules because they were part of a larger application) were used to count motifs, short nucleotide words up to 10-12 base pairs, and then calculate statistical overrepresentation of these words by comparing a foreground set of DNA sequences against a background set.

We will start comparing the different approaches of the C++ and the Python codes and point out advantages and disadvantages of doing it in one language or the other. First thing we need to do is to count the motifs in all sequences from our foreground and background sets. For the project I was working on, the ideal word length was 10 nucleotides. Basically our C++ approach to increase speed was to transform the character DNA sequences into numbers and then, while sliding a window with the desired word length, hash the base-four numbers into base 10 and increment a vector position, previously initialized with 0. For four nucleotides and a word size of 10 there are 1,048,576 permutations possible, from AAAAAAAAAA to TTTTTTTTTT. Initially the C++ program would do

for(j = 0; j &lt; seqsize; j++)
{
    seqfile.get(base);
    if(base != '\n')
    {
        switch(base)
        {
            case 'A':
                bseq.push_back(0);
                break;
            case 'C':
                bseq.push_back(1);
                break;
            case 'G':
                bseq.push_back(2);
                break;
            case 'T':
                bseq.push_back(3);
                break;
        }
    }
}

reading all sequences and pushing an figure for each nucleotide in a vector, and then sliding a window on this vector and hashing the base-four number

int hashSeq(vector<short> subseq)
{
    int w, i, hashvalue = 0, power;
    w = subseq.size() - 1;
    for(i = 0; i &lt; subseq.size(); i++)
    {
            power = 0;
            hashvalue += subseq[i] * pow((double)4,(double)w);
            w--;
    }
    return hashvalue;
}

if(binseq[i].size() &gt; motifwidth)
{
    for(j = 0; j &lt; binseq[i].size()-motifwidth+1; j++)
    {
        sub.assign(binseq[i].begin() + j, binseq[i].begin() + j+motifwidth);
        hashed = hashSeq(sub);
        nmercount[hashed]++;
        sub.clear();
    }
}

The whole C++ code has about 400 lines, including all the possible output formatting and printing. Timing with time the C++ executable takes a little bit less than 2 seconds to read, count and output different files.

For Python, we will use a different approach and gain a lot in code simplicity. As we want to count the number of times size-10 words appear in all sequences, we first need to generate all possible permutations (with replacement) of four nucleotides. This can be easily accomplished by using generator functions. Regular functions run until completion and the return a value. For instance, a function that calculates the factorial of 10 will return the last value only, after multiplying 10.9.8.7.6.5.4.3.2. A generator function runs until a value is available to return, yielding it and then suspending its operation until called again. The yielding part was emphasized because yield is the command used by Python to return the value and suspend the function until further notice. In the factorial function, a generator would return the intermediary factorial values up to 10.

To generate all 1 million plus permutations of 4 nucleotides we need a function similar to the one below (modified from here)

def permutations(items, n):
    if n == 0:
        yield ''
    else:
        for i in range(len(items)):
            for base in permutations(items, n - 1):
                yield str(items[i]) + str(base)

Basically, what this generator function does is to combine all four nucleotides in words of size 10. This is a recursive function, where the result of the function is dependent on the n-1 value calculated by the function until n equals 0. The first for loop over the items that we want to permutate (the nucleotides) and the second for recursively calls permutations<code> starting with the initial <code>n passed (10) until we reach 0. Debugging this function we will see that i is constant for each iteration of the second loop and only n changes from 10 to 0, while one by one nucleotides are joined to form a motif. It starts with AAAAAAAAAA, then AAAAAAAAAC, then AAAAAAAAAG, until it gets to a poly-T.

Our final code would look like the one below

import fasta
import sys

def permutations(items, n):
    if n == 0:
        yield ''
    else:
        for i in range(len(items)):
            for base in permutations(items, n - 1):
                yield str(items[i]) + str(base)

seqs = fasta.get_seqs(open(sys.argv[1]).readlines())
length = sys.argv[2]

nucleotides = ['A', 'C', 'G', 'T']

merged_seqs = ''
for i in seqs:
    merged_seqs += i.sequence

for i in permutations(nucleotides, int(length)):
    print i + '\t' + merged_seqs.count(i)

where we read the input sequence(s), merge them in one long string and as we generate all possible combinations we count the number of times they appear. This code running on the same input file used on the C++ executable is 60 times slower, taking in average one full minute to count motifs in 8 500 bp DNA sequences. The slowest section is the count, as the generation of all possible combinations is straightforward. We lose some speed, but gain a lot on code simplicity and clarity. Next we will modify this code to output different counts needed for the statistical analysis.

by Paulo Nuin at May 10, 2008 02:46 AM

May 09, 2008

Voidspace

ironpythoninaction.com: New Chapters and Sourcecode Available

Three new chapters have been added to the Manning Early Access Program for IronPython in Action. There are now eleven out of fifteen chapters available. ... [130 words]

May 09, 2008 09:23 PM

Making It Stick (Patrick Logan)

Objectively

"Rubinius switched from C to C++ to implement it's core VM"

For the life of me I cannot understand why projects use C++ rather than Objective-C. Hmm.

Catching up on comments to this post...

Yeah, I can be too brief sometimes. Here's the essence of what I like about ObjC vs. C++. ObjC attempts to keep the Obj and the C distinct, while C++ attempts to combine them. As a result the Obj in ObjC is very much like the Obj in Smalltalk. And the C on Obj C is very much like the C in ANSI C.

The Obj in C++ is significantly more complicated than the Obj in Smalltalk or in ObjC. The C in C++ is also significantly more complicated, to the point where I don't think it can be called "C". People will talk about the expressiveness of C++ and how much it has evolved over the years. I still very much prefer the simplicity of ObjC.

I am also surprised the ObjC has portability issues. With the GNU implementation?

And I am surprised about the Ruby kernel issue as well. I also thought this would be so small to warrant just C or even better, a subset of Ruby that compiles easily into C. This is what Squeak uses for its kernel. Gambit Scheme does something along the same lines, allowing a very C-ish dialect of Scheme that translates directly.

by Patrick Logan (noreply@blogger.com) at May 09, 2008 08:24 PM

IronPython Url's

Standalone Silverlight Applications with Moonlight

One advantage that Adobe AIR applications have over Silverlight is that they don't have to be hosted in the browser. (An advantage that Silverlight has over AIR is that it can be programmed in Python.)

The Linux support for Silverlight 2 (the interesting version) is not there yet. Linux support is coming through the 'Microsoft-blessed' Mono Moonlight project. They are still working on Silverlight 1.0, but Miguel de Icaza says that they will soon switch to Silverlight 2 development.

Something that Moonlight can already be used for that you can't do with Silverlight is to create applications that use the Moonlight UI, but have full access to the Mono stack and aren't limited to the browser. These are called Moonlight Desklets, and they can be programmed with IronPython.

Lets hope that something similar for Silverlight shows up soon...

by Fuzzyman (noreply@blogger.com) at May 09, 2008 07:14 PM

Accessing IronPython Objects from Javascript in the Winforms WebBrowser Control

Srivatsn has a blog entry showing how to expose IronPython objects to Javascript in the Windows Forms WebBrowser control.
This could be very useful combined with this technique for creating standalone desktop Silverlight applications.

by Fuzzyman (noreply@blogger.com) at May 09, 2008 06:48 PM

Titus Brown

pygr gets some summer love

(pygr is a neat bioinformatics framework in Python.)

After some commenters on my last post seemed happy to hear that pygr was the focus of some summer work, I realized I had only discussed the pygr summer work in a post to the biology-in-python list.

Whoops.

So, here's the scoop: not only is pygr the focus of Rachel McCreary's Google Summer of Code project, but Jenny Qian will be using pygr to build an ENSEMBL interface, also as part of the Google Summer of Code.

That's not all!

In addition to Rachel and Jenny (under the sterling mentorship of Chris Lee, Robert Kirkpatrick, Namshin Kim, and myself) I have two MSU students working with me over the summer, Alex Nolley and Marie Buckner. They'll both be working with pygr-related things, although like Jenny their efforts may end up being more on ways to use pygr than on pygr's code itself.

I also have a grad student or two that may drop in on pygr, if only to use it for something research-y.

So all in all, pygr will get a lot of love this summer. Hopefully we can polish the code and documentation and tutorials to the point where the learning curve is as minimal as it can get, and this fabulous package will become readily available to many others...

Why am I personally putting so much effort into pygr? Well, I've been using it more and more over the last few months, and (somewhat like scipy) it's transformed my work by turning annoyingly difficult data organization problems into trivial Python transformations. I can literally throw together a custom genome browser in a matter of hours -- I've implemented two or three already, for different projects -- and it has enabled several new research program. pygr seems to be one of those rare packages (kind of like Python itself) that is not only functional and effective but presents a unified and coherent intellectual interface. pygr is the only good middleware layer I've seen for sequence intertwingling in bioinformatics. It's not that mature yet, but it has serious promise, and I'm hoping to get in on the ground floor, so to speak :).

cheers,

--titus

May 09, 2008 06:03 PM

Jesse Noller

Python 2.6a3 and 3.0a5 released

Barry sent the email out last night that both Python 2.6a3 and 3.0a5 are released - these are the final alphas for both. I'd go and grab em while they're still hot off the presses... Provided you're not already sync'ing from svn/bzr/mercurial/wtf.

by jesse at May 09, 2008 05:28 PM

Base-Art

Elisa on Windows and new resources for contributors

So our Windows developer strike force came up with a windows version of Elisa, Windows XP and Vista are supported. Check out the Elisa download page to find the alpha version of the installer. There are some issues, this is an alpha, you're warned :)

Alessandro and Olivier cooked 2 tutorials showing off how to develop new features for Elisa, by example. The API of the upcoming 0.5 branch of Elisa has also been published, it might evolve a bit but it's already a good starting point for motivated contributors out there :)

by Philippe Normand at May 09, 2008 03:36 PM

Making It Stick (Patrick Logan)

Isn't That What The Internets Are For?

Joe Wilcox watches Microsoft and wonders...

"Mesh is the only thing that really makes sense out of a Yahoo
acquisition to me. Yahoo has rich content services—and they're
everywhere. If Microsoft could plug Mesh into that infrastructure,
fast, and flip the switch "Wow!" Imagine, for example, Mesh making
Flickr photos instantly available to all your PCs, cell phones and
TVs. Software plus hardware plus services."

But isn't that what the internets are for?

http://www.microsoft-watch.com/content/web_services_browser/yahoo_between_a_rock_and_a_hard_place.html?kc=MWRSS02129TX1K0000535

by Patrick Logan (noreply@blogger.com) at May 09, 2008 12:22 PM

Andrew Channels Dexter Pinion - Andy Todd

Opening a file in Python

I’m sure I read this somewhere recently, but my scratchy memory and command of Google can’t bring it back to me.

Is there a Python idiom for accepting either a file name or a file object as a function parameter?

The closest I can get is this;

def my_function(file_name_or_object):
    try:
        open(file_name_or_object)
    except TypeError:
        file = file_name_or_object
    return file

Any improvements on this are more than welcome.

by Andy Todd at May 09, 2008 11:01 AM

Noah Gift

Google App Engine Application Request: Python User Group Website

If anyone was interested in a great Google App Engine project, I would love to see a community blog/speaker registration tool. Jeff Rush mentioned something like this a couple of PyCons ago, but now there is the technology available for free with...

by Noah Gift at May 09, 2008 10:57 AM

Fabio Zadrozny

Bug in pydev package explorer (1.3.16)

Ok, a serious bug was found in the pydev package explorer... in version 1.3.16, when a project has the project root in the pythonpath, its children won't appear. For users that have a source folder within the project, this doesn't seem to happen.

This problem has just been fixed and a new release should be out pretty soon (in 1.3.17)

by Fabio Zadrozny (noreply@blogger.com) at May 09, 2008 01:09 AM

Deadly Bloody Serious about Python - Garth Kidd

Default arguments in Python: two easy blunders

I’m glad I stumbled across Patrick Altman’s tweet about a “default bug in Django“. I’d never have guessed you can pass a callable to a field’s default= argument, otherwise. That’s quite a powerful idiom, and I think I’ll use it a lot. To balance the karma, I’d like to post a quick reminder to everyone else [...]

by garth at May 09, 2008 01:05 AM

Grig Gheorghiu

May 08, 2008

Andrew Channels Dexter Pinion - Andy Todd

Trouble Getting a Date

I’m having trouble with dates. This can be summed up in a couple of high level issues;

1. Date support in relational databases is insane, or at the best inconsistent.

As far as I can tell the ANSI SQL-92 standard defines date, time, interval and timestamp data types. Which doesn’t help when SQL Server only implements something called ‘datetime’ - at least I think so, have you tried accessing any sort of manual for a Microsoft product online? Blimey, I thought billg had embraced this web thing years ago. Oracle has the ‘date’ data type (which is actually a time stamp) and MySQL, well they’ve gone and outdone everyone by implementing DATETIME, DATE, TIMESTAMP, TIME, and YEAR.

2. The Python DB-API does not cope with date data type ambiguity well.

When it comes to the date question the Python DB-API states (and I quote) ” … may use mx.DateTime”, which if you ask me isn’t much of a standard. This needs to change so that all DB-API modules return consistent datetime objects, not such a big issue as datetime has been part of the standard library since, what, Python 2.3?

Sadly even if we fix this it won’t work with Sqlite as it doesn’t consistently support data typing. In my experiments regardless of what sort of date you insert into the database you get a unicode string back. Don’t believe me? Try this in Python 2.5;

>>> from sqlite3 import dbapi2
>>> db = dbapi2.connect('test_db')
>>> cursor = db.cursor()
>>> cursor.execute('create table date_test (id integer not null primary key autoincrement, sample_date DATE NOT NULL)'
>>> stmt = "INSERT INTO date_test (sample_date) VALUES (?)"
>>> cursor.execute(stmt, (1234, ))
>>> import datetime
>>> cursor.execute(stmt, (datetime.date(2008, 3, 10), ))
>>> cursor.execute(stmt, ('My name is Earl', ))
>>> db.commit()
>>> cursor.execute("SELECT * FROM date_test")
>>> results = cursor.fetchall()
>>> for item in results:
...     print item[1], type(item[1])
1234 
2008-03-10 
My name is Earl 
>>>

But note that it is fine for integers.

3. The people writing the Python standard library modules are on crack.

Outside of the database world and within the batteries included Python standard library some modules use datetime, others time and there are even uses of calendar.

O.K. I’ll accept that maybe the module authors aren’t on full strength crack, because the time module just exposes underlying posix functions. But the people who wrote those were on something strong and hallucinogenic. I table the following function signatures from section 14.2 of the Python Library Reference 2.5 as an example;

strftime(format[, t ])
strptime(string[, format ])

This has bitten me twice in the last twenty four hours and frankly I’m not happy.

I appreciate that there are historical reasons for having inconsistent function signatures but can someone please fix this in Python 3.0. All we need is a single module that can access the underlying system clock and then convert between a number of different representations of that and other epoch driven dates. How hard can it be? As far as I can tell this is not part of the proposed standard library re-organisation. I think it should be.

by Andy Todd at May 08, 2008 10:04 PM

Shannon -jj Behrens

Joel on Software: Never Rewrite from Scratch

I was thinking of Joel on Software's famous post Things You Should Never Do, Part I where he says, "[Netscape] did it by making the single worst strategic mistake that any software company can make: They decided to rewrite the code from scratch."

Since Joel is from Microsoft, I was pondering what would have happened if the Microsoft NT developers had taken that advice and based NT on DOS. Perhaps it's illustrative to compare the quality of Windows ME vs. Windows 2000 and XP.

by Shannon -jj Behrens (noreply@blogger.com) at May 08, 2008 08:12 PM

S.Lott

Standard Software Defects - Java Edition


Here are some software defects so typical, that I've collected a handy short list with acronyms. I've also got a specific technique for remediating those awful Everything In Main programs.

May 08, 2008 12:58 PM

Ivan Krstic

Harvard Law goes Open Access

In February, Harvard’s Faculty of Arts and Sciences (FAS) unanimously approved an Open Access resolution, committing to make all its research available through a public repository. It was the first US college to do so.

Yesterday, Harvard Law School unanimously voted to become the first US law school with the same commitment.

Over a scant few years, Harvard Law pulled together Larry Lessig and Jonathan Zittrain, and recently recruited both Yochai Benkler and Cass Sunstein. These are, along with folks like John Palfrey, the finest legal thinkers of their generation. I am incredibly hopeful about the kind of cyberlaw activism and trendsetting we’ll see with these minds all sharing an affiliation.

by Ivan Krstić at May 08, 2008 10:29 AM

Simon Wittber

Apache + SSL + PSK on Ubuntu

This howto describes the process of using Apache and SSL with trusted clients, via Pre Shared Keys. Unlike the usual way of using SSL, this setup requires the server _and_ the client to have valid certificates. This means you need to create a client certificate and deliver it securely to the client.

1. Enable SSL.
sudo a2enmod ssl

2. Generate a private key without a passphrase,
openssl genrsa -out server.key 1024

or with a passphrase.
openssl genrsa -des3 -out server.key 1024

3. Create a certificate signing request.
openssl req -new -key server.key -out server.csr

4. Sign it yourself.
openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt

5. Copy your new certificate and keys to the appropriate places.
sudo cp server.crt /etc/ssl/certs
sudo cp server.key /etc/ssl/private

6. Edit your apache site configuration, add these lines into a VirtualHost section.
SSLEngine on
SSLCertificateFile /etc/ssl/certs/server.crt
SSLCertificateKeyFile /etc/ssl/private/server.key
SSLVerifyClient require
SSLVerifyDepth 1
SSLCACertificateFile /etc/ssl/certs/server.crt

7. Create a certificate which you can give to a client, or a group of clients.
openssl pkcs12 -export -out client_cert.pfx -in server.crt -inkey server.key\
-name 'Certificate Name'

8. Make sure the client gets the client_cert.pfx file, which they install into their browser.

9. To use the client_certificate.pfx in a python httplib or httplib2, it needs to be split into a key and certificate file.
openssl pkcs12 -clcerts -nokeys -in client_cert.pfx -out client_cert.pem
openssl pkcs12 -nocerts -in client_cert.pfx -out client_key.pem

10. Strip the pass phrase from client_key.pem so Python does not prompt for a pass phrase!
openssl rsa -in client_key.pem -out unsecured_client_key.pem

by Simon Wittber (noreply@blogger.com) at May 08, 2008 03:52 AM

Blue Sky On Mars

Paver 0.7: Better than distutils, better docs and much more

I’m delighted to release Paver 0.7. If you missed my original announcement, the short story is that Paver is a new build, distribution and deployment scripting tool geared toward Python projects. My original announcement and the new foreword to the docs explain the motivation.

Ben Bangert and others pointed out a giant documentation bug in 0.4: there was a fair bit of reference doc but no doc that said “here’s how you get started with Paver”. Now there is: Paver’s Getting Started Guide.

Paver 0.7 is a big step up from 0.4 (hence the version number bump). I implemented one of the two major features I had planned for 1.0: distutils/setuptools integration. It’s really cool. Have you ever wanted to just slightly change how “sdist” or “upload” or “develop” worked? Now you can, just by writing a function in your pavement.py file. And don’t worry, you don’t need to duplicate anything between setup.py and pavement.py. It all just moves into pavement.py and Paver can even generate a setup.py file for you, since most people are use to the common “python setup.py install” command.

I’ve gone even farther than that with making it easy to use Paver and not annoy users that don’t yet have Paver. Paver can create a small zip file of Paver’s core bits so that “python setup.py install” will work just fine even for users who don’t have Paver installed. Paver can also create a virtualenv bootstrap script for you, so that users don’t necessarily need to install your package on their systems in order to use it.

Paver’s got new documentation tools that work great with Sphinx. It’s now easy to mark sections of sample code files and then include those sections in your documentation, using the built-in version of Ned Batchelder’s Cog.

And I’m definitely eating my own dogfood. Paver is built using Paver itself and the source distribution includes the paver-minilib so that setup.py install should work fine (let me know if it doesn’t!) The new Getting Started Guide uses the new documentation tools.

There are even more changes than these, and you can look at the changelog for the full list. Note that if you’re using Paver 0.4, there are a couple of trivial breaking changes.

ShareThis

by Kevin Dangoor at May 08, 2008 02:55 AM

Fabio Zadrozny

Pydev 1.3.16

Yeap, it's just been released. Most of the work on this release was on bug-fixes, so, it should be safe to upgrade without major concerns.

The launching facility had some changes, mostly regarding the Ctrl+F11 when it's set to launch the current editor (it just didn't work the way it was supposed to), but that's only valid for new launch configurations (so, if you do want to use it instead of having it launch the previously launched app, existing launches should be deleted -- just note that you should delete only a few launches at a time (around 10-15) -- for some reason eclipse takes a lot of time to delete lots of launches at once (I tried doing it here and it halted for about 5 minutes until I decided to kill it and delete in small steps).

by Fabio Zadrozny (noreply@blogger.com) at May 08, 2008 01:45 AM

May 07, 2008

Titus Brown

Dear Lazyweb: JavaScript "imagemaps" and/or image subselection?

Dear Lazyweb, help!

I'm embarking on a number of summer projects in my new lab at MSU, and several of them focus on using pygr to do cool genomic stuff. In particular, I'm planning to build a personal genome annotation system that will let people run their own full genome Web sites and annotate the genomes with private information such as Solexa data, cDNA/EST projects, ChIP-seq, cis-regulatory reporter constructs, ncRNA predictions, etc. etc. (If you're interested in this sort of thing, get in touch -- it will, of course, be open source and open development, albeit in Python :)

As I've been thinking more about how to do the display side of things, I've been running headfirst into a serious lack of knowledge. I would like to make an interface that looks somewhat like your standard genome browser/GMOD/UCSC interface, such as this UCSC view of the chicken genome. I already have the basics of that view working; for example, see this simple example and a group-feature example. But I'd like to add more - a LOT more -- interactivity.

Ideally I'd like to be able to draw simple objects (squares, rectangles, lines) on some sort of canvas and then use JavaScript and AJAX to pop up windows and display bits of information. But I don't really know this space of functionality very well.

So I'm turning to the lazyweb.

Are JavaScript+image maps the right way to go (for example, this, this, and this)? Do they work well with multiple browsers? Or are there good JS libraries for drawing images on the fly in the browser? Is SVG a good thing to look at? Were you stuck with this task, what would you use?

The most important things for this project are, in order of importance:

  • basic functionality (JS image maps seem fine for this)
  • cross-browser functionality
  • selection (e.g. GMOD RubberBandSelection)
  • flexibility: reordering and redrawing of images.

Your thoughts are much appreciated! Please drop me a line or comment, whichever is most convenient. I'll summarize the options.

thanks,

--titus

p.s. I'm perfectly fine with "Google this, dumby!" I just don't have much in the way of google keyword knowledge in this area...

May 07, 2008 10:03 PM

Pycon

Groovie

Pylons on JVM's (and other VMs)

Phil Jenvey has been making some great progress getting all the components of Pylons running on Jython, and posted a good write-up of the remaining work being done. It’s interesting to note that one of the big issues will affect any web framework on Jython, not just Pylons. That is, the reload time when used in development to restart the server.

While I don’t plan on deploying Pylons apps in WAR files anytime soon, its nice to see Jython emerging as a candidate for deployment.

by ben at May 07, 2008 06:40 PM

Ted Leung on the Air

CommunityOne

Live or semi liveblogging conferences has been getting more and more difficult for me to do. The combination of meetings, networking/parties, and photographs means that it takes longer to assemble the requisite material. Here’s a bit on CommunityOne, which took place on Monday.

Many people (mostly Sun folks) have been asking me if this is my first JavaOne. My answer is, “it’s not, but it is my first one in ten years”. It’s been quite some time since I’ve been to a conference run by a big company like Sun (as opposed to an O’Reilly or open-source community conference). Even though the basics are the same, I definitely feel a kind of culture shock. I was asked to be on a panel during the general session, first thing in the morning, in order to get miked up and to run though the flow. Production values are much higher than I am used to. I keep thinking of CommunityOne as a small event, but in reality it is huge. I am told that registration was around 5000 people, which is twice the size of OSCON, which is the largest conference that I’ve been to in the last 4 or 5 years. Some pictures might help with the scale and production values:

CommunityOne 2008

CommunityOne 2008

The panel was on community models, although the content was closer to the edge where companies and open source communities meet/collaborate/fight. I think that I had two or three chances to speak, including the final set of remarks before the close of the panel. I have some more thoughts on that topic, but they are deserving of their own post, so that will be showing up after JavaOne is over.

Probably my favorite thing that happened at CommunityOne was the demonstration of ZFS’s reliability in the face of hardware failures. Sun Fellow Jim Hughes has demonstrated this a few times at Sun Tech days, and I’ve been meaning to write about that. I got to meet Jim before the keynote, and I had a very good seat to observe the hardware failure.

CommunityOne 2008

Jim usually destroys 2 of the drives in the ZFS pool, and it looked like Rich Green (EVP of Software) was going to get to smash the other one, until Jeff Bonwick, the inventor of ZFS, showed up to do the honors himself.

CommunityOne 2008

Smashing things makes for cool demos - you can watch the video replay if you like.. I’ve been paying more attention to ZFS ever since Theo Schlossnagle sat with me and a few other people in a bar at ApacheCon in Atlanta last year. We were talking about the voracious storage needs of photographers, and Theo was really singing the praises of ZFS. There were so important things that happened to ZFS for OpenSolaris 00805 (which was launched at CommunityOne). The most important is that you can now boot off of a ZFS volume. I hope (but don’t know for sure) that the work that made this possible will make it possible for Macs to boot off of a ZFS volume. My photo storage is getting all fragmented, and I could really put ZFS to good use. I suppose that I could build a ZFS storage appliance based on OpenStorage, but at the moment that is more work that I want to do.

I spent much of the rest of CommunityOne at the Redmonk unconference. I was drafted for an impromptu discussion on dynamic and other programming languages, which included a drop in from David Pollak, developer of the very cool lift framework for Scala, and organizer of the Scala liftoff which is happening on Saturday, right after JavaOne. There was also a very active session on Twitter - probably the biggest of the unconference. Jim Evans Edwards from Twitter came along to participate in that one

CommunityOne 2008

I have a bunch more photos from CommunityOne. At the rate that things are going, I will probably just do a single post on JavaOne. There are plenty of other people doing liveblogging, for those who need a bigger information flow.

Update: corrected Jim Edwards’ name. Thanks to @monkchips

by Ted Leung at May 07, 2008 04:30 PM

Terry Peppers

Small Victories

Yesterday I successfully used map() and lambda without having to look @ the documentation (and yet I link to the documentation)!

Something like:

some_url = "http://www.foo.com/"
things = ["foo", "bar", "baz"]
urls = map(lambda x: some_url + x, things)
for u in urls:
    print u

UPDATE: Sorry based on the comment by ‘baoilleach’ I feel compelled to update the code above using map() and lambda with a list comprehension as suggested.

some_url = "http://www.foo.com/"
things = ["foo", "bar", "baz"]
urls = [some_url + t for t in things]
for u in urls:
    print u

And don’t ask why I wasn’t using urlparse, I swear I have a good reason.

Another little thing that I think I was too dense too get was my misconception that lambda could only take one argument. I don’t know where I picked that up could be similar to someone thinking that tuples could only have two items (I’m just kidding Pam) - two-ples, ya know?

Simple lambda influenced by what Mr. I says these days:

>>> mine = lambda n: n.capitalize() + " is mine!"
>>> mine("book")
'Book is mine!'
>>> mine("shirt")
'Shirt is mine!'

But what about more than a single argument. Easy:

>>> huh = lambda what, who: what.capitalize() + ", " + who.capitalize() + "?"
>>> huh("cow", "daddy")
'Cow, Daddy?'
>>> huh("cat", "mommy")
'Cat, Mommy?'
>>>

I don’t use lambda’s very often, most of the time if I’m going to write a lambda I just write a function. Anyway, small victory for me.

by terryp at May 07, 2008 01:30 PM

Armin Ronacher

Jinja2 Documentation Online

I now uploaded the documentation for Jinja2 to the website for those of you who are eager and want to play with it :-) On jinja.pocoo.org you have now the choice to chose between Jinja1 and Jinja2.

The new docs are powered by Sphinx and Jinja2 with a custom templating bridge.

Read the documenation.

by Armin Ronacher at May 07, 2008 12:08 PM

Lawrence Oluyede

Twitter page for PyCon Italy

I opened a Twitter account for the PyCon Italy conference. I will try to keep updated as soon as things come up and the conference starts on Friday.

http://twitter.com/pyconit

by Lawrence at May 07, 2008 12:05 PM

Armin Ronacher

Simple batch function for Python

Often I have an iterable i want to group. For example a list of integers and i want to process two at once. That’s a pretty nice idom I found in the documentation translated to itertools:

from itertools import izip, repeat

def batch(iterable, n):
    return izip(*repeat(iter(iterable), n))

Use it like that:

>>> for key, value in batch([1, 2, 3, 4], 2):
...  print key, value
... 
1 2
3 4

by Armin Ronacher at May 07, 2008 10:44 AM

Tim Golden

London Python Meetup May 2008

Thanks as always to Simon Brunning for organising another meetup at Thoughtworks, who generously funded the beer & pizza as well. Whether by planning or force of circumstances I’m not sure, but we started with half-a-dozen lightning talks before moving on to the main speaker of the evening. Simon made it clear up-front that overrunning [...]

by tim at May 07, 2008 10:31 AM

Sylvain Hellegouarch

Redmine vs Trac

Following my last post in regards to a move from subversion to mercurial I received many interesting comments. One key aspect that those comments showed was that no matter which DCVS I would use it would be critical to me that it could be interfaced with Trac since that's the tool I'm using to manage my projects. Jim Jones hinted that I could also see the problem the other way around and decide to change for a different software management tool that would be better at handling the DCVS I'd choose. He led me to discover Redmine.

I wasn't very motivated by the idea of migrating from Trac to a different tool. Many reasons to that:

1. Trac has answered most of my needs until now.
2. It's well spread and has an active community.
3. I'm damn lazy when it comes to such mundane task.

Nonetheless Jim had made me curious and so I did give a look at the Redmine's features and I wasn't disappointed. It basically supports what Trac offers with some more interesting built-in features like Gantt chart, multiple projects, forums, DCVS, etc. Of course most of these features could be integrated to Trac easily thanks to the community (although I can't tell whether or not multiple projects in one Trac instance is feasible).

That being said not everything is perfect in this world and while discussing about this topic on the #kamaelia IRC channel, Matt Hammond, one of the Kamaelia long time project developer, linked me to a note from John Goerzen indicating that social considerations were sometimes as important as technical ones.

I guess you understand that I'm still struggling on which decision to make. Nevertheless redmine looks like a great product and if you're not using any software management tool yet I'm pretty sure you want to give a close look at it.

by Sylvain Hellegouarch (nospam@example.com) at May 07, 2008 08:29 AM

Kun Xi

PyAWS 0.3.0 released

After 6 months, PyAWS 0.3.0 is eventually released. You can check out the tar ball here.

I almost abandoned this project as I found the XSLT approach is more appealing: ideal for AJAX application and easy to integrate via simplejson in the server side. Furthermore, I joined Microsoft, moved to Canada, and had less spare time to work on less interested hobby work. The last straw is the unexpected complicity of the the BIG FAT refactory.

Until recently, I got the email from one PyAWS user, he reported a bug on unexpected result of ListLookup operation. It is so good to hear from some users that this library still benefits somebody in the world. So I picked it up, completed the refactory and released it today. The library still in active development, the code style stinks, the document sucks and most of all, testing is lacking — I would explain it for a little bit here.

I am a big fan of TDD personally, and we have respected testing troops to help building our products in MSFT as well. However, the complexity of PyAWS is far beyond my capacity: there are tens of operations and twenties of response groups, and response groups may combine, that make it extremely difficult to cover all the paths. To make it worse, the AWS is dynamic, there is no guarantee that the consecutive queries would return the same result. I may consider automation to facilitate the unit tests. If you have better ideas, please leave a comment here.

by bookstack at May 07, 2008 06:18 AM

Groovie

Most bizarre Git service and other stupid Rails powered "businesses"

I can’t help but get totally baffled when I see a business model like this.

Yes, that’s right, you can pay for the privilege of keeping a copy of your distributed version control system (DVCS) private repositories on someone else’s machines. You also get to pay depending on how many people you want to allow to collaborate on it.

Nevermind that one of the entire points of a DVCS is that you do NOT need a central repository. Does anyone actually work at a “Large Company” (as the page indicates) that would be stupid enough to pay $100/month so they can put all their proprietary and very personal code repositories on a third party web service?

So what are you paying for? Well, to start with, they have awesome integration with Lighthouse, since we all know there’s no decent free open-source issue tracking system… cough trac cough roundup cough. Oh wait, since there’s absolutely no simple web-based issue tracking systems, let’s have another slick business model to get people to pay for a stripped down Trac (but this time with a really pretty UI)!

What do these sites have in common? Rails, “look ma, I can copy-paste the business plan too” pricing models, and some good graphic designers at the helm. There also seems to be an interesting amount of promotion between these sites, as well as a nice blog post from the Rails creator himself promoting GitHub. I’m sure no one who has read this rant should be surprised though.

I only hope that no one starts to believe that a DVCS actually requires these “please pay” copies of their DVCS repo.

by ben at May 07, 2008 03:18 AM

Ned Batchelder

So that happened... (Digg, Slashdot, and WebFaction)

Last Thursday, I posted the animated CSS Homer, and it was a big hit. Friday morning, it was popular on Digg (over 3000 diggs). The resulting Digg effect was enough for my hosting provider to shut off my site.

I was a cheapskate when I bought my hosting plan from TotalChoice Hosting, looking only for low cost. Their reaction seemed aggravatingly uninformed. The support guy kept referring to the traffic spike as "an attack". I tried to explain that it was in fact a success, and that they had failed to help me deal with that success. I could understand needing to protect their widely shared servers, but at least they could speak knowledgeably about the event.

He also called it a DDOS, which it was, but only if it stands for Distributed Desirability Of Stuff.

Further angering me was the fact that my email was unavailable, since they simply shut off my entire account. Also, there was a misconfiguration in the 403 page they were serving, so the traffic logs showed every request resulting in another request for a non-existent 403.shtml page. TotalChoice will be the first to point out that they are not the right service for a high-traffic site, but they should at least be conversant in the language of their newly disappointed customers, and know how to correctly shut off accounts.

Saturday morning, the traffic had subsided and the site was reactivated, and I figured I could spend some time researching options for a new provider. Slicehost seemed good if I wanted to go the VPS route, though sysadmin is not my interest or forte, so I was leery of taking on all the responsibility for the machine, however virtual it was.

WebFaction seemed the best choice of the shared providers, with supported Django, and many Django sites hosted.

I was away for the weekend, so I wasn't actively working on the problem. My site was up, I could now plan my next move.

At least, until I got slashdotted. Now the site was really shut down, and TotalChoice wasn't too pleased. The only way back online was with a new provider. WebFaction got the gig, because I don't need complete control over a machine. A shared account with shell access and supported Django would be great. I looked in their forum for Digg effect issues, and saw intelligent conversation. I had dropped them a line outlining my situation, and they made clear that they had dealt with it before and would work with me if such good fortune arose again, but that they would shut down sites if it was the only way to protect the shared servers. In a way, that last caveat reassured me. If they had made a blanket claim that their servers were Digg-proof, it would have smelled of naive or dishonest admins.

Monday I signed up, switched over my domains nameservers, re-uploaded my site, and I was back online. After getting TotalChoice to reactivate my old site, I transfered the blog comments, and now everything should be back as good as new.

It would have been nice to survive the Digg and Slashdotting. Maybe with WebFaction I will next time. I've got a new appreciation for slimming down the server needs of my blog. The avatars in comments are something to think about: the Homer post has 70 comments, meaning each page load also generates 70 image requests. One possibility is to offload the image to another service.

The irony in all this is that although I started with TotalChoice because of how inexpensive they were, I'm not paying much more for the WebFaction account.

May 07, 2008 12:23 AM

May 06, 2008

Ian Bicking

The GPL and Principles

For the most part by the time I finished writing my last article on licensing I had mostly convinced myself that the GPL isn’t a practical license for most projects. That is, outcomes when using the GPL aren’t likely to be any better than outcomes using a permissive license, except for certain kinds of projects, mostly projects involving big faceless companies, and I’d just as soon avoid such projects anyway.

My own thinking on this has changed over the years in part because of a greater sense of humility about what I produce. I’m really not that worried about people stealing my work because I don’t think that theft would be of much value. But also because I realize that the value in software is not so much in the code as in the process. The process is what is valuable, particularly for open source, and licensing doesn’t really address issues of process.

As an example, if I’m uncomfortable with how some member of an open source community is using the code, or the community, I will be much more effective by dealing with that head-on, talking with that member, or even confronting them if it’s really necessary. If you give someone an unwelcoming attitude, they’ll probably go away. The license doesn’t need to be your gatekeeper. It’s not a particularly effective gatekeeper anyway.

Another change is perhaps a more reasonable valuation of code. There was a time when people wanted to protect their intellectual property. Even some non-software company might have gotten the idea that it should own the code it contracts someone else to write, under a proprietary license, so they could sell that software later. That anyone would care to buy it was always an illusion, but the illusion is a little more obvious these days.

One value of the GPL that I do want to acknowledge is its expression of values. It makes this explicit:

When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things.

To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others.

But the GPL does more than just its text: adopting the GPL is a statement of principle on the part of the original authors, of the people who adopt the project, and of the people who later help maintain the project. It is a statement that freedom is valued and that it is valued in a universal sense, not just in a personal or isolated sense.

This is implicit, not explicit, in the choice of license, but despite that I see this pattern in projects. Projects that choose the GPL are more likely to engender a spirit of openness and sharing. Not of the core project itself — both GPL and permissively licensed projects accomplish this just fine so long as they are properly maintained, and their success is far more related to how the project is managed than the licensing. But I see the difference in the sofware that grows up around the project: extensions, complementary projects, documentation.

Maybe this is because of licensing. The license filters the community, and the people who are left in a GPL project are all at least open to sharing. But more than that, I think it puts people in the right state of mind to share. The project feels more principled, the participation is based less on pragmatism and more on optimism. And there’s always people coming into open source who haven’t really figured out why or what they want to get out of it. Presenting them with the principles of Free Software influences their decision. (This issue has caused some debate about terminology.)

With all that said, you don’t need the GPL to present the principles of a project. It’s certainly the easiest way to do so. The GPL is shorthand for a rich set of principles and ideals. But it’s shorthand for people who are already in the know. The ideas need to be reiterated and explained and reconsidered to stay relevant. I think a project might do more good with an explicit statement of principles. With that in place the licensing might not matter so much.

by Ian Bicking at May 06, 2008 06:08 PM

Malthe Borch

Meet Dobbin

If you've been with Zope since at least 2004, chances are you've heard about Ape, the adaptable persistence engine, which allows you to create a ZODB mount point that uses an SQL database as storage [1]. Ape uses native column types to store attributes, if a schema is provided, else the attributes are stored using the python pickle format.

Enter Dobbin, an adaptable persistence engine which does away with the complexity of Ape by relying on SQLAlchemy for relational storage, and zope.schema for schema declaration [2]. The codebase is slim, and developer documentation is provided as doctests.

Tables are created on-the-fly with a 1:1 correspondence to interfaces with no inheritance (minimal interface). As such, objects are modelled as a join between the interfaces it implements. This approach allows using the database as a catalog in an way that's fully integrated with zope.schema. As an example, listing folder contents is a matter of acquiring the joined mapper of ILocation and IDCDescriptiveProperties, and doing a select by _parent_.

[1] Florent Guillaume on object-relational mapping in Zope
[2] Project page for z3c.dobbin

by malthe (noreply@blogger.com) at May 06, 2008 03:32 PM

David Goodger

Unicode misinformation

It's great that Google is moving to Unicode 5.1 and that UTF-8 is so popular, but I wish they'd get their terms straight!

May 06, 2008 03:30 PM

Small Values of Cool - Simon Brunning

Noah Gift

Greedy Coin Google App Engine Application With Source

Here is an application on Google App Engine Application I wrote for an upcoming PyAtl Talk, and an upcoming O'Reilly Online Article: http://greedycoin.appspot.com/ Quick notes: Really liking the datastore API. I also liked the Django templates even though I have touched them...

by Noah Gift at May 06, 2008 04:28 AM

Ian Bicking

Governance

It occurred to me… Django is something like a dictatorship… or maybe an oligarchy. At first it seems like Pylons is the same… but no. Pylons is clearly feudal. I lord over Paste, WebOb, FormEncode. Mike Bayer lords over Mako and SQLAlchemy. Ben lords over Routes, Beaker, and Pylons.

I suppose in all cases there is a certain amount of democracy, because there are no serfs, and any individual is free to travel to any kingdom they like. Well, at least among the open source kingdoms. Without citizenship, and with no exclusiveness of ownership, with even property having largely disappeared, I suppose it’s inevitable that traditional metaphors of control and governance don’t really make sense.

by Ian Bicking at May 06, 2008 04:08 AM

Eric Florenzano