Byteflow Blog Engine. This looks like the most full-featured of the Django blog engines by a pretty big margin, including OpenID client and server support. A product of the growing Russian/Ukrainian Django community.
Byteflow Blog Engine. This looks like the most full-featured of the Django blog engines by a pretty big margin, including OpenID client and server support. A product of the growing Russian/Ukrainian Django community.
heapq implements a min-heap sort algorithm suitable for use with Python's lists.heappush() and heapify().heappush(), the heap sort order of the elements is maintained as new items are added from a data source.import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
heap = []
print 'random :', data
for n in data:
print 'add %3d:' % n
heapq.heappush(heap, n)
show_tree(heap)
$ python heapq_heappush.py
random : [19, 9, 4, 10, 11, 8, 2]
add 19:
19
------------------------------------
add 9:
9
19
------------------------------------
add 4:
4
19 9
------------------------------------
add 10:
4
10 9
19
------------------------------------
add 11:
4
10 9
19 11
------------------------------------
add 8:
4
10 8
19 11 9
------------------------------------
add 2:
2
10 4
19 11 9 8
------------------------------------
heapify() to rearrange the items of the list in place.import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
print 'random :', data
heapq.heapify(data)
print 'heapified :'
show_tree(data)
$ python heapq_heapify.py
random : [19, 9, 4, 10, 11, 8, 2]
heapified :
2
9 4
10 11 8 19
------------------------------------
heappop() to remove the element with the lowest value. In this example, adapted from the stdlib documentation, heapify() and heappop() are used to sort a list of numbers.import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
print 'random :', data
heapq.heapify(data)
print 'heapified :'
show_tree(data)
inorder = []
while data:
smallest = heapq.heappop(data)
print 'pop %3d:' % smallest
show_tree(data)
inorder.append(smallest)
print 'inorder :', inorder
$ python heapq_heappop.py
random : [19, 9, 4, 10, 11, 8, 2]
heapified :
2
9 4
10 11 8 19
------------------------------------
pop 2:
4
9 8
10 11 19
------------------------------------
pop 4:
8
9 19
10 11
------------------------------------
pop 8:
9
10 19
11
------------------------------------
pop 9:
10
11 19
------------------------------------
pop 10:
11
19
------------------------------------
pop 11:
19
------------------------------------
pop 19:
------------------------------------
inorder : [2, 4, 8, 9, 10, 11, 19]
heapreplace().import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
heapq.heapify(data)
print 'start:'
show_tree(data)
for n in [0, 7, 13, 9, 5]:
smallest = heapq.heapreplace(data, n)
print 'replace %2d with %2d:' % (smallest, n)
show_tree(data)
$ python heapq_heapreplace.py
start:
2
9 4
10 11 8 19
------------------------------------
replace 2 with 0:
0
9 4
10 11 8 19
------------------------------------
replace 0 with 7:
4
9 7
10 11 8 19
------------------------------------
replace 4 with 13:
7
9 8
10 11 13 19
------------------------------------
replace 7 with 9:
8
9 9
10 11 13 19
------------------------------------
replace 8 with 5:
5
9 9
10 11 13 19
------------------------------------
heapq also includes 2 functions to examine an iterable to find a range of the largest or smallest values it contains. Using nlargest() and nsmallest() are really only efficient for relatively small values of n > 1, but can still come in handy in a few cases.import heapq
from heapq_heapdata import data
print 'all :', data
print '3 largest :', heapq.nlargest(3, data)
print 'from sort :', list(reversed(sorted(data)[-3:]))
print '3 smallest:', heapq.nsmallest(3, data)
print 'from sort :', sorted(data)[:3]
$ python heapq_extremes.py
all : [19, 9, 4, 10, 11, 8, 2]
3 largest : [19, 11, 10]
from sort : [19, 11, 10]
3 smallest: [2, 4, 8]
from sort : [2, 4, 8]
Technorati Tags:
python, PyMOTW
by Doug Hellmann (noreply@blogger.com) at May 11, 2008 05:10 PM
by Christopher Armstrong (noreply@blogger.com) at May 11, 2008 04:36 PM
johnw and Ilari on #git for their help on this.Ilari pointed out, GIT's DAV has many problems over SVN, from lockups to a failed push sometimes corrupting the shared repository. I tried anyway, with hours of help from joshw, and at 5am I had to throw in the towel. I'm sure it's possible, but not with my current lack of GIT knowledge.git-svn client-side for now.by Doug Hellmann (noreply@blogger.com) at May 11, 2008 03:23 PM
Another post for the Poincaré Project.
We've already seen that a one-form is a linear function from a vector to a (for our purposes) real number. On a manifold, one-forms correspond to stack-type vectors being applied to arrow-type vectors by counting how many "stacks" the arrow passes through.
In the previous post Metrics As Mappings Between Arrows and Stacks, we saw that a metric is an extra bit of structure that describes how to map between arrow-type vectors and stack-type vectors.
So, in summary:
These two facts can be combined to let you take two arrow-type vectors and get a real number out of them.
This has parallels with currying in functional programming.
Recall that if a function "add" takes two integers and returns an integer, it can be viewed as a function that takes one integer and returns a function that takes one integer and returns an integer.
add :: Int -> Int -> Int
Now, a one-form is a function that takes a vector and returns a real. In other words:
Vector -> Real
So it is easy to see that if you curry a real-valued function that takes two vectors you get:
Vector -> Vector -> Real
In other words, a function taking two vectors to a real is equivalent to a function from a vector to a one-form.
So if you have a metric that can convert between vectors and one-forms (or, in the context of a manifold, between arrows and stacks) then you also have a function from two vectors to a real.
Such a function is called an inner product or dot product. Often the notion of an inner product is defined first, before one-forms are introduced (if at all). In fact, some texts will define a metric to be an inner product. It is best for our purposes, though, to think of the metric's fundamental purpose as being converting between arrows and stacks (and back again) and the inner product as being an extra concept we get for free.
We move one on our search for overrepresented motifs in DNA sequences. I was preparing this entry when the comments to part 1 arrived. Because of that we will modify our previous code to include some suggestions and then we will change t a little bit to output the actual values we want. As Titus pointed out in his comment, when we merge the sequences we generate errors, because we artificially including motifs that are not supposed to be there. But in the end we will see that we are no after the actual number of times a motif appears in the sequence and what matters is the motif quorum.
The main focus of the part one was to show the decrease in code length from C++ to Python, introduce generator functions and yield and also show the nice permutation function. Turns out by using the approach of the previous entry there is a big drawback in code execution, which is around 60 times slower than C++. So we need to find another approach, and Mike showed us how (with some small modifications)
#!/usr/bin/env python
from collections import defaultdict
from random import choice
import sys
import fasta
seqs = fasta.get_seqs(open(sys.argv[1]).readlines())
length = int(sys.argv[2])
#for a missing key, the dict entry is initialized to zero
counts = defaultdict(int)
#count the length-element subsequences in merged_seq
for i in seqs:
for n in range(len(i.sequence) - length):
counts[i.sequence[n : n + length]] += 1
#counts.keys() will then return the nucleotide sequences
#that were actually in merged_seqs
#print out the sequences that occur more than once
for count in counts:
if count >= 2:
print ''.join(count), counts[count]
This time the counting is done with a defaultdict initialized with integers (all zeroes), and instead of generating all possible combinations (which was cool and fast, but in the end made our script slower) a window with the input length is slided along the each sequence and the key of the default dictionary is the motif and the value is the actual count incremented as we determine each motif.
Checking this code performance using Linux’s time, we get 0.2 seconds, an improvement of 5 fold on the C++ code and 300 fold on our previous Python code (Thanks Mike and everyone else for the suggestions).
A couple of other differences here is that the output is not ordered and only the seen motifs are printed in the end. We will see next time how to get the quorums.
In addition to the Python sprint day work I am doing (as well as pymag stuff) I've been editing the obligatory "zomg baby is walking" video, which is below.
It's funny - she's been close, and doing short spurts, but last night it was like her walking switch just "came on".


In the post Reusable Django Apps and Introducing Tabula Rasa I mentioned my project to create an out-of-the-box Django-based website with everything but the domain-specific functionality.
At the time I was calling it Tabula Rasa but now I've settled on the Greek word Pinax, proposed by Orestis Markou.
So far it's just my new django-email-confirmation app tied together with password change and reset, login/logout, with the beginnings of a tab-style UI. There's a ton more I want to refactor out of my existing websites to put into it as well as adding support for OpenID and the stuff I'm starting to do for django-friends.
Even if one doesn't use Pinax as the starting point of a website, I'm hoping it will prove very useful for another goal, namely a "host" project to develop and tryout reusable apps.
The initial code is available at http://code.google.com/p/django-hotclub/ under /trunk/projects/pinax and there is a running instance for you to try out at:
http://pinax.hotcluboffrance.com
Update: Ryan McGuire blogged about his Emacs presentation in more detail.
by Jonathan Ellis (noreply@blogger.com) at May 10, 2008 10:22 AM
Changing gears now, leaving behind Pfam alignments. I decided to start a new series of posts based on the conversion of some small C++ programs I developed in the past. These small programs (I call them modules because they were part of a larger application) were used to count motifs, short nucleotide words up to 10-12 base pairs, and then calculate statistical overrepresentation of these words by comparing a foreground set of DNA sequences against a background set.
We will start comparing the different approaches of the C++ and the Python codes and point out advantages and disadvantages of doing it in one language or the other. First thing we need to do is to count the motifs in all sequences from our foreground and background sets. For the project I was working on, the ideal word length was 10 nucleotides. Basically our C++ approach to increase speed was to transform the character DNA sequences into numbers and then, while sliding a window with the desired word length, hash the base-four numbers into base 10 and increment a vector position, previously initialized with 0. For four nucleotides and a word size of 10 there are 1,048,576 permutations possible, from AAAAAAAAAA to TTTTTTTTTT. Initially the C++ program would do
for(j = 0; j < seqsize; j++)
{
seqfile.get(base);
if(base != '\n')
{
switch(base)
{
case 'A':
bseq.push_back(0);
break;
case 'C':
bseq.push_back(1);
break;
case 'G':
bseq.push_back(2);
break;
case 'T':
bseq.push_back(3);
break;
}
}
}
reading all sequences and pushing an figure for each nucleotide in a vector, and then sliding a window on this vector and hashing the base-four number
int hashSeq(vector<short> subseq)
{
int w, i, hashvalue = 0, power;
w = subseq.size() - 1;
for(i = 0; i < subseq.size(); i++)
{
power = 0;
hashvalue += subseq[i] * pow((double)4,(double)w);
w--;
}
return hashvalue;
}
if(binseq[i].size() > motifwidth)
{
for(j = 0; j < binseq[i].size()-motifwidth+1; j++)
{
sub.assign(binseq[i].begin() + j, binseq[i].begin() + j+motifwidth);
hashed = hashSeq(sub);
nmercount[hashed]++;
sub.clear();
}
}
The whole C++ code has about 400 lines, including all the possible output formatting and printing. Timing with time the C++ executable takes a little bit less than 2 seconds to read, count and output different files.
For Python, we will use a different approach and gain a lot in code simplicity. As we want to count the number of times size-10 words appear in all sequences, we first need to generate all possible permutations (with replacement) of four nucleotides. This can be easily accomplished by using generator functions. Regular functions run until completion and the return a value. For instance, a function that calculates the factorial of 10 will return the last value only, after multiplying 10.9.8.7.6.5.4.3.2. A generator function runs until a value is available to return, yielding it and then suspending its operation until called again. The yielding part was emphasized because yield is the command used by Python to return the value and suspend the function until further notice. In the factorial function, a generator would return the intermediary factorial values up to 10.
To generate all 1 million plus permutations of 4 nucleotides we need a function similar to the one below (modified from here)
def permutations(items, n):
if n == 0:
yield ''
else:
for i in range(len(items)):
for base in permutations(items, n - 1):
yield str(items[i]) + str(base)
Basically, what this generator function does is to combine all four nucleotides in words of size 10. This is a recursive function, where the result of the function is dependent on the n-1 value calculated by the function until n equals 0. The first for loop over the items that we want to permutate (the nucleotides) and the second for recursively calls permutations<code> starting with the initial <code>n passed (10) until we reach 0. Debugging this function we will see that i is constant for each iteration of the second loop and only n changes from 10 to 0, while one by one nucleotides are joined to form a motif. It starts with AAAAAAAAAA, then AAAAAAAAAC, then AAAAAAAAAG, until it gets to a poly-T.
Our final code would look like the one below
import fasta
import sys
def permutations(items, n):
if n == 0:
yield ''
else:
for i in range(len(items)):
for base in permutations(items, n - 1):
yield str(items[i]) + str(base)
seqs = fasta.get_seqs(open(sys.argv[1]).readlines())
length = sys.argv[2]
nucleotides = ['A', 'C', 'G', 'T']
merged_seqs = ''
for i in seqs:
merged_seqs += i.sequence
for i in permutations(nucleotides, int(length)):
print i + '\t' + merged_seqs.count(i)
where we read the input sequence(s), merge them in one long string and as we generate all possible combinations we count the number of times they appear. This code running on the same input file used on the C++ executable is 60 times slower, taking in average one full minute to count motifs in 8 500 bp DNA sequences. The slowest section is the count, as the generation of all possible combinations is straightforward. We lose some speed, but gain a lot on code simplicity and clarity. Next we will modify this code to output different counts needed for the statistical analysis.
"Rubinius switched from C to C++ to implement it's core VM"
For the life of me I cannot understand why projects use C++ rather than Objective-C. Hmm.
Catching up on comments to this post...
Yeah, I can be too brief sometimes. Here's the essence of what I like about ObjC vs. C++. ObjC attempts to keep the Obj and the C distinct, while C++ attempts to combine them. As a result the Obj in ObjC is very much like the Obj in Smalltalk. And the C on Obj C is very much like the C in ANSI C.
The Obj in C++ is significantly more complicated than the Obj in Smalltalk or in ObjC. The C in C++ is also significantly more complicated, to the point where I don't think it can be called "C". People will talk about the expressiveness of C++ and how much it has evolved over the years. I still very much prefer the simplicity of ObjC.
I am also surprised the ObjC has portability issues. With the GNU implementation?
And I am surprised about the Ruby kernel issue as well. I also thought this would be so small to warrant just C or even better, a subset of Ruby that compiles easily into C. This is what Squeak uses for its kernel. Gambit Scheme does something along the same lines, allowing a very C-ish dialect of Scheme that translates directly.
by Patrick Logan (noreply@blogger.com) at May 09, 2008 08:24 PM
(pygr is a neat bioinformatics framework in Python.)
After some commenters on my last post seemed happy to hear that pygr was the focus of some summer work, I realized I had only discussed the pygr summer work in a post to the biology-in-python list.
Whoops.
So, here's the scoop: not only is pygr the focus of Rachel McCreary's Google Summer of Code project, but Jenny Qian will be using pygr to build an ENSEMBL interface, also as part of the Google Summer of Code.
That's not all!
In addition to Rachel and Jenny (under the sterling mentorship of Chris Lee, Robert Kirkpatrick, Namshin Kim, and myself) I have two MSU students working with me over the summer, Alex Nolley and Marie Buckner. They'll both be working with pygr-related things, although like Jenny their efforts may end up being more on ways to use pygr than on pygr's code itself.
I also have a grad student or two that may drop in on pygr, if only to use it for something research-y.
So all in all, pygr will get a lot of love this summer. Hopefully we can polish the code and documentation and tutorials to the point where the learning curve is as minimal as it can get, and this fabulous package will become readily available to many others...
Why am I personally putting so much effort into pygr? Well, I've been using it more and more over the last few months, and (somewhat like scipy) it's transformed my work by turning annoyingly difficult data organization problems into trivial Python transformations. I can literally throw together a custom genome browser in a matter of hours -- I've implemented two or three already, for different projects -- and it has enabled several new research program. pygr seems to be one of those rare packages (kind of like Python itself) that is not only functional and effective but presents a unified and coherent intellectual interface. pygr is the only good middleware layer I've seen for sequence intertwingling in bioinformatics. It's not that mature yet, but it has serious promise, and I'm hoping to get in on the ground floor, so to speak :).
cheers,
--titus
Barry sent the email out last night that both Python 2.6a3 and 3.0a5 are released - these are the final alphas for both. I'd go and grab em while they're still hot off the presses... Provided you're not already sync'ing from svn/bzr/mercurial/wtf.
So our Windows developer strike force came up with a windows version of Elisa, Windows XP and Vista are supported. Check out the Elisa download page to find the alpha version of the installer. There are some issues, this is an alpha, you're warned :)
Alessandro and Olivier cooked 2 tutorials showing off how to develop new features for Elisa, by example. The API of the upcoming 0.5 branch of Elisa has also been published, it might evolve a bit but it's already a good starting point for motivated contributors out there :)
"Mesh is the only thing that really makes sense out of a Yahoo
acquisition to me. Yahoo has rich content services—and they're
everywhere. If Microsoft could plug Mesh into that infrastructure,
fast, and flip the switch "Wow!" Imagine, for example, Mesh making
Flickr photos instantly available to all your PCs, cell phones and
TVs. Software plus hardware plus services."
But isn't that what the internets are for?
by Patrick Logan (noreply@blogger.com) at May 09, 2008 12:22 PM
I’m sure I read this somewhere recently, but my scratchy memory and command of Google can’t bring it back to me.
Is there a Python idiom for accepting either a file name or a file object as a function parameter?
The closest I can get is this;
def my_function(file_name_or_object):
try:
open(file_name_or_object)
except TypeError:
file = file_name_or_object
return file
Any improvements on this are more than welcome.
by Fabio Zadrozny (noreply@blogger.com) at May 09, 2008 01:09 AM
by Grig Gheorghiu (noreply@blogger.com) at May 09, 2008 12:17 AM
I’m having trouble with dates. This can be summed up in a couple of high level issues;
1. Date support in relational databases is insane, or at the best inconsistent.
As far as I can tell the ANSI SQL-92 standard defines date, time, interval and timestamp data types. Which doesn’t help when SQL Server only implements something called ‘datetime’ - at least I think so, have you tried accessing any sort of manual for a Microsoft product online? Blimey, I thought billg had embraced this web thing years ago. Oracle has the ‘date’ data type (which is actually a time stamp) and MySQL, well they’ve gone and outdone everyone by implementing DATETIME, DATE, TIMESTAMP, TIME, and YEAR.
2. The Python DB-API does not cope with date data type ambiguity well.
When it comes to the date question the Python DB-API states (and I quote) ” … may use mx.DateTime”, which if you ask me isn’t much of a standard. This needs to change so that all DB-API modules return consistent datetime objects, not such a big issue as datetime has been part of the standard library since, what, Python 2.3?
Sadly even if we fix this it won’t work with Sqlite as it doesn’t consistently support data typing. In my experiments regardless of what sort of date you insert into the database you get a unicode string back. Don’t believe me? Try this in Python 2.5;
>>> from sqlite3 import dbapi2
>>> db = dbapi2.connect('test_db')
>>> cursor = db.cursor()
>>> cursor.execute('create table date_test (id integer not null primary key autoincrement, sample_date DATE NOT NULL)'
>>> stmt = "INSERT INTO date_test (sample_date) VALUES (?)"
>>> cursor.execute(stmt, (1234, ))
>>> import datetime
>>> cursor.execute(stmt, (datetime.date(2008, 3, 10), ))
>>> cursor.execute(stmt, ('My name is Earl', ))
>>> db.commit()
>>> cursor.execute("SELECT * FROM date_test")
>>> results = cursor.fetchall()
>>> for item in results:
... print item[1], type(item[1])
1234
2008-03-10
My name is Earl
>>>
But note that it is fine for integers.
3. The people writing the Python standard library modules are on crack.
Outside of the database world and within the batteries included Python standard library some modules use datetime, others time and there are even uses of calendar.
O.K. I’ll accept that maybe the module authors aren’t on full strength crack, because the time module just exposes underlying posix functions. But the people who wrote those were on something strong and hallucinogenic. I table the following function signatures from section 14.2 of the Python Library Reference 2.5 as an example;
strftime(format[, t ]) strptime(string[, format ])
This has bitten me twice in the last twenty four hours and frankly I’m not happy.
I appreciate that there are historical reasons for having inconsistent function signatures but can someone please fix this in Python 3.0. All we need is a single module that can access the underlying system clock and then convert between a number of different representations of that and other epoch driven dates. How hard can it be? As far as I can tell this is not part of the proposed standard library re-organisation. I think it should be.
by Shannon -jj Behrens (noreply@blogger.com) at May 08, 2008 08:12 PM

In February, Harvard’s Faculty of Arts and Sciences (FAS) unanimously approved an Open Access resolution, committing to make all its research available through a public repository. It was the first US college to do so.
Yesterday, Harvard Law School unanimously voted to become the first US law school with the same commitment.
Over a scant few years, Harvard Law pulled together Larry Lessig and Jonathan Zittrain, and recently recruited both Yochai Benkler and Cass Sunstein. These are, along with folks like John Palfrey, the finest legal thinkers of their generation. I am incredibly hopeful about the kind of cyberlaw activism and trendsetting we’ll see with these minds all sharing an affiliation.
sudo a2enmod ssl
openssl genrsa -out server.key 1024
openssl genrsa -des3 -out server.key 1024
openssl req -new -key server.key -out server.csr
openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt
sudo cp server.crt /etc/ssl/certs
sudo cp server.key /etc/ssl/private
SSLEngine on
SSLCertificateFile /etc/ssl/certs/server.crt
SSLCertificateKeyFile /etc/ssl/private/server.key
SSLVerifyClient require
SSLVerifyDepth 1
SSLCACertificateFile /etc/ssl/certs/server.crt
openssl pkcs12 -export -out client_cert.pfx -in server.crt -inkey server.key\
-name 'Certificate Name'
openssl pkcs12 -clcerts -nokeys -in client_cert.pfx -out client_cert.pem
openssl pkcs12 -nocerts -in client_cert.pfx -out client_key.pem
openssl rsa -in client_key.pem -out unsecured_client_key.pem
by Simon Wittber (noreply@blogger.com) at May 08, 2008 03:52 AM
I’m delighted to release Paver 0.7. If you missed my original announcement, the short story is that Paver is a new build, distribution and deployment scripting tool geared toward Python projects. My original announcement and the new foreword to the docs explain the motivation.
Ben Bangert and others pointed out a giant documentation bug in 0.4: there was a fair bit of reference doc but no doc that said “here’s how you get started with Paver”. Now there is: Paver’s Getting Started Guide.
Paver 0.7 is a big step up from 0.4 (hence the version number bump). I implemented one of the two major features I had planned for 1.0: distutils/setuptools integration. It’s really cool. Have you ever wanted to just slightly change how “sdist” or “upload” or “develop” worked? Now you can, just by writing a function in your pavement.py file. And don’t worry, you don’t need to duplicate anything between setup.py and pavement.py. It all just moves into pavement.py and Paver can even generate a setup.py file for you, since most people are use to the common “python setup.py install” command.
I’ve gone even farther than that with making it easy to use Paver and not annoy users that don’t yet have Paver. Paver can create a small zip file of Paver’s core bits so that “python setup.py install” will work just fine even for users who don’t have Paver installed. Paver can also create a virtualenv bootstrap script for you, so that users don’t necessarily need to install your package on their systems in order to use it.
Paver’s got new documentation tools that work great with Sphinx. It’s now easy to mark sections of sample code files and then include those sections in your documentation, using the built-in version of Ned Batchelder’s Cog.
And I’m definitely eating my own dogfood. Paver is built using Paver itself and the source distribution includes the paver-minilib so that setup.py install should work fine (let me know if it doesn’t!) The new Getting Started Guide uses the new documentation tools.
There are even more changes than these, and you can look at the changelog for the full list. Note that if you’re using Paver 0.4, there are a couple of trivial breaking changes.
by Fabio Zadrozny (noreply@blogger.com) at May 08, 2008 01:45 AM
Dear Lazyweb, help!
I'm embarking on a number of summer projects in my new lab at MSU, and several of them focus on using pygr to do cool genomic stuff. In particular, I'm planning to build a personal genome annotation system that will let people run their own full genome Web sites and annotate the genomes with private information such as Solexa data, cDNA/EST projects, ChIP-seq, cis-regulatory reporter constructs, ncRNA predictions, etc. etc. (If you're interested in this sort of thing, get in touch -- it will, of course, be open source and open development, albeit in Python :)
As I've been thinking more about how to do the display side of things, I've been running headfirst into a serious lack of knowledge. I would like to make an interface that looks somewhat like your standard genome browser/GMOD/UCSC interface, such as this UCSC view of the chicken genome. I already have the basics of that view working; for example, see this simple example and a group-feature example. But I'd like to add more - a LOT more -- interactivity.
Ideally I'd like to be able to draw simple objects (squares, rectangles, lines) on some sort of canvas and then use JavaScript and AJAX to pop up windows and display bits of information. But I don't really know this space of functionality very well.
So I'm turning to the lazyweb.
Are JavaScript+image maps the right way to go (for example, this, this, and this)? Do they work well with multiple browsers? Or are there good JS libraries for drawing images on the fly in the browser? Is SVG a good thing to look at? Were you stuck with this task, what would you use?
The most important things for this project are, in order of importance:
- basic functionality (JS image maps seem fine for this)
- cross-browser functionality
- selection (e.g. GMOD RubberBandSelection)
- flexibility: reordering and redrawing of images.
Your thoughts are much appreciated! Please drop me a line or comment, whichever is most convenient. I'll summarize the options.
thanks,
--titus
p.s. I'm perfectly fine with "Google this, dumby!" I just don't have much in the way of google keyword knowledge in this area...
by David Goodger (noreply@blogger.com) at May 07, 2008 07:30 PM
Phil Jenvey has been making some great progress getting all the components of Pylons running on Jython, and posted a good write-up of the remaining work being done. It’s interesting to note that one of the big issues will affect any web framework on Jython, not just Pylons. That is, the reload time when used in development to restart the server.
While I don’t plan on deploying Pylons apps in WAR files anytime soon, its nice to see Jython emerging as a candidate for deployment.
Live or semi liveblogging conferences has been getting more and more difficult for me to do. The combination of meetings, networking/parties, and photographs means that it takes longer to assemble the requisite material. Here’s a bit on CommunityOne, which took place on Monday.
Many people (mostly Sun folks) have been asking me if this is my first JavaOne. My answer is, “it’s not, but it is my first one in ten years”. It’s been quite some time since I’ve been to a conference run by a big company like Sun (as opposed to an O’Reilly or open-source community conference). Even though the basics are the same, I definitely feel a kind of culture shock. I was asked to be on a panel during the general session, first thing in the morning, in order to get miked up and to run though the flow. Production values are much higher than I am used to. I keep thinking of CommunityOne as a small event, but in reality it is huge. I am told that registration was around 5000 people, which is twice the size of OSCON, which is the largest conference that I’ve been to in the last 4 or 5 years. Some pictures might help with the scale and production values:
The panel was on community models, although the content was closer to the edge where companies and open source communities meet/collaborate/fight. I think that I had two or three chances to speak, including the final set of remarks before the close of the panel. I have some more thoughts on that topic, but they are deserving of their own post, so that will be showing up after JavaOne is over.
Probably my favorite thing that happened at CommunityOne was the demonstration of ZFS’s reliability in the face of hardware failures. Sun Fellow Jim Hughes has demonstrated this a few times at Sun Tech days, and I’ve been meaning to write about that. I got to meet Jim before the keynote, and I had a very good seat to observe the hardware failure.
Jim usually destroys 2 of the drives in the ZFS pool, and it looked like Rich Green (EVP of Software) was going to get to smash the other one, until Jeff Bonwick, the inventor of ZFS, showed up to do the honors himself.
Smashing things makes for cool demos - you can watch the video replay if you like.. I’ve been paying more attention to ZFS ever since Theo Schlossnagle sat with me and a few other people in a bar at ApacheCon in Atlanta last year. We were talking about the voracious storage needs of photographers, and Theo was really singing the praises of ZFS. There were so important things that happened to ZFS for OpenSolaris 00805 (which was launched at CommunityOne). The most important is that you can now boot off of a ZFS volume. I hope (but don’t know for sure) that the work that made this possible will make it possible for Macs to boot off of a ZFS volume. My photo storage is getting all fragmented, and I could really put ZFS to good use. I suppose that I could build a ZFS storage appliance based on OpenStorage, but at the moment that is more work that I want to do.
I spent much of the rest of CommunityOne at the Redmonk unconference. I was drafted for an impromptu discussion on dynamic and other programming languages, which included a drop in from David Pollak, developer of the very cool lift framework for Scala, and organizer of the Scala liftoff which is happening on Saturday, right after JavaOne. There was also a very active session on Twitter - probably the biggest of the unconference. Jim Evans Edwards from Twitter came along to participate in that one

I have a bunch more photos from CommunityOne. At the rate that things are going, I will probably just do a single post on JavaOne. There are plenty of other people doing liveblogging, for those who need a bigger information flow.
Update: corrected Jim Edwards’ name. Thanks to @monkchips
Yesterday I successfully used map() and lambda without having to look @ the documentation (and yet I link to the documentation)!
Something like:
some_url = "http://www.foo.com/"
things = ["foo", "bar", "baz"]
urls = map(lambda x: some_url + x, things)
for u in urls:
print u
UPDATE: Sorry based on the comment by ‘baoilleach’ I feel compelled to update the code above using map() and lambda with a list comprehension as suggested.
some_url = "http://www.foo.com/"
things = ["foo", "bar", "baz"]
urls = [some_url + t for t in things]
for u in urls:
print u
And don’t ask why I wasn’t using urlparse, I swear I have a good reason.
Another little thing that I think I was too dense too get was my misconception that lambda could only take one argument. I don’t know where I picked that up could be similar to someone thinking that tuples could only have two items (I’m just kidding Pam) - two-ples, ya know?
Simple lambda influenced by what Mr. I says these days:
>>> mine = lambda n: n.capitalize() + " is mine!"
>>> mine("book")
'Book is mine!'
>>> mine("shirt")
'Shirt is mine!'
But what about more than a single argument. Easy:
>>> huh = lambda what, who: what.capitalize() + ", " + who.capitalize() + "?"
>>> huh("cow", "daddy")
'Cow, Daddy?'
>>> huh("cat", "mommy")
'Cat, Mommy?'
>>>
I don’t use lambda’s very often, most of the time if I’m going to write a lambda I just write a function. Anyway, small victory for me.
I now uploaded the documentation for Jinja2 to the website for those of you who are eager and want to play with it :-) On jinja.pocoo.org you have now the choice to chose between Jinja1 and Jinja2.
The new docs are powered by Sphinx and Jinja2 with a custom templating bridge.
Read the documenation.
I opened a Twitter account for the PyCon Italy conference. I will try to keep updated as soon as things come up and the conference starts on Friday.
Often I have an iterable i want to group. For example a list of integers and i want to process two at once. That’s a pretty nice idom I found in the documentation translated to itertools:
from itertools import izip, repeat
def batch(iterable, n):
return izip(*repeat(iter(iterable), n))
Use it like that:
>>> for key, value in batch([1, 2, 3, 4], 2):
... print key, value
...
1 2
3 4
by Sylvain Hellegouarch (nospam@example.com) at May 07, 2008 08:29 AM
After 6 months, PyAWS 0.3.0 is eventually released. You can check out the tar ball here.
I almost abandoned this project as I found the XSLT approach is more appealing: ideal for AJAX application and easy to integrate via simplejson in the server side. Furthermore, I joined Microsoft, moved to Canada, and had less spare time to work on less interested hobby work. The last straw is the unexpected complicity of the the BIG FAT refactory.
Until recently, I got the email from one PyAWS user, he reported a bug on unexpected result of ListLookup operation. It is so good to hear from some users that this library still benefits somebody in the world. So I picked it up, completed the refactory and released it today. The library still in active development, the code style stinks, the document sucks and most of all, testing is lacking — I would explain it for a little bit here.
I am a big fan of TDD personally, and we have respected testing troops to help building our products in MSFT as well. However, the complexity of PyAWS is far beyond my capacity: there are tens of operations and twenties of response groups, and response groups may combine, that make it extremely difficult to cover all the paths. To make it worse, the AWS is dynamic, there is no guarantee that the consecutive queries would return the same result. I may consider automation to facilitate the unit tests. If you have better ideas, please leave a comment here.
I can’t help but get totally baffled when I see a business model like this.
Yes, that’s right, you can pay for the privilege of keeping a copy of your distributed version control system (DVCS) private repositories on someone else’s machines. You also get to pay depending on how many people you want to allow to collaborate on it.
Nevermind that one of the entire points of a DVCS is that you do NOT need a central repository. Does anyone actually work at a “Large Company” (as the page indicates) that would be stupid enough to pay $100/month so they can put all their proprietary and very personal code repositories on a third party web service?
So what are you paying for? Well, to start with, they have awesome integration with Lighthouse, since we all know there’s no decent free open-source issue tracking system… cough trac cough roundup cough. Oh wait, since there’s absolutely no simple web-based issue tracking systems, let’s have another slick business model to get people to pay for a stripped down Trac (but this time with a really pretty UI)!
What do these sites have in common? Rails, “look ma, I can copy-paste the business plan too” pricing models, and some good graphic designers at the helm. There also seems to be an interesting amount of promotion between these sites, as well as a nice blog post from the Rails creator himself promoting GitHub. I’m sure no one who has read this rant should be surprised though.
I only hope that no one starts to believe that a DVCS actually requires these “please pay” copies of their DVCS repo.
Last Thursday, I posted the animated CSS Homer, and it was a big hit. Friday morning, it was popular on Digg (over 3000 diggs). The resulting Digg effect was enough for my hosting provider to shut off my site.
I was a cheapskate when I bought my hosting plan from TotalChoice Hosting, looking only for low cost. Their reaction seemed aggravatingly uninformed. The support guy kept referring to the traffic spike as "an attack". I tried to explain that it was in fact a success, and that they had failed to help me deal with that success. I could understand needing to protect their widely shared servers, but at least they could speak knowledgeably about the event.
He also called it a DDOS, which it was, but only if it stands for Distributed Desirability Of Stuff.
Further angering me was the fact that my email was unavailable, since they simply shut off my entire account. Also, there was a misconfiguration in the 403 page they were serving, so the traffic logs showed every request resulting in another request for a non-existent 403.shtml page. TotalChoice will be the first to point out that they are not the right service for a high-traffic site, but they should at least be conversant in the language of their newly disappointed customers, and know how to correctly shut off accounts.
Saturday morning, the traffic had subsided and the site was reactivated, and I figured I could spend some time researching options for a new provider. Slicehost seemed good if I wanted to go the VPS route, though sysadmin is not my interest or forte, so I was leery of taking on all the responsibility for the machine, however virtual it was.
WebFaction seemed the best choice of the shared providers, with supported Django, and many Django sites hosted.
I was away for the weekend, so I wasn't actively working on the problem. My site was up, I could now plan my next move.
At least, until I got slashdotted. Now the site was really shut down, and TotalChoice wasn't too pleased. The only way back online was with a new provider. WebFaction got the gig, because I don't need complete control over a machine. A shared account with shell access and supported Django would be great. I looked in their forum for Digg effect issues, and saw intelligent conversation. I had dropped them a line outlining my situation, and they made clear that they had dealt with it before and would work with me if such good fortune arose again, but that they would shut down sites if it was the only way to protect the shared servers. In a way, that last caveat reassured me. If they had made a blanket claim that their servers were Digg-proof, it would have smelled of naive or dishonest admins.
Monday I signed up, switched over my domains nameservers, re-uploaded my site, and I was back online. After getting TotalChoice to reactivate my old site, I transfered the blog comments, and now everything should be back as good as new.
It would have been nice to survive the Digg and Slashdotting. Maybe with WebFaction I will next time. I've got a new appreciation for slimming down the server needs of my blog. The avatars in comments are something to think about: the Homer post has 70 comments, meaning each page load also generates 70 image requests. One possibility is to offload the image to another service.
The irony in all this is that although I started with TotalChoice because of how inexpensive they were, I'm not paying much more for the WebFaction account.
For the most part by the time I finished writing my last article on licensing I had mostly convinced myself that the GPL isn’t a practical license for most projects. That is, outcomes when using the GPL aren’t likely to be any better than outcomes using a permissive license, except for certain kinds of projects, mostly projects involving big faceless companies, and I’d just as soon avoid such projects anyway.
My own thinking on this has changed over the years in part because of a greater sense of humility about what I produce. I’m really not that worried about people stealing my work because I don’t think that theft would be of much value. But also because I realize that the value in software is not so much in the code as in the process. The process is what is valuable, particularly for open source, and licensing doesn’t really address issues of process.
As an example, if I’m uncomfortable with how some member of an open source community is using the code, or the community, I will be much more effective by dealing with that head-on, talking with that member, or even confronting them if it’s really necessary. If you give someone an unwelcoming attitude, they’ll probably go away. The license doesn’t need to be your gatekeeper. It’s not a particularly effective gatekeeper anyway.
Another change is perhaps a more reasonable valuation of code. There was a time when people wanted to protect their intellectual property. Even some non-software company might have gotten the idea that it should own the code it contracts someone else to write, under a proprietary license, so they could sell that software later. That anyone would care to buy it was always an illusion, but the illusion is a little more obvious these days.
One value of the GPL that I do want to acknowledge is its expression of values. It makes this explicit:
When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things.
To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others.
But the GPL does more than just its text: adopting the GPL is a statement of principle on the part of the original authors, of the people who adopt the project, and of the people who later help maintain the project. It is a statement that freedom is valued and that it is valued in a universal sense, not just in a personal or isolated sense.
This is implicit, not explicit, in the choice of license, but despite that I see this pattern in projects. Projects that choose the GPL are more likely to engender a spirit of openness and sharing. Not of the core project itself — both GPL and permissively licensed projects accomplish this just fine so long as they are properly maintained, and their success is far more related to how the project is managed than the licensing. But I see the difference in the sofware that grows up around the project: extensions, complementary projects, documentation.
Maybe this is because of licensing. The license filters the community, and the people who are left in a GPL project are all at least open to sharing. But more than that, I think it puts people in the right state of mind to share. The project feels more principled, the participation is based less on pragmatism and more on optimism. And there’s always people coming into open source who haven’t really figured out why or what they want to get out of it. Presenting them with the principles of Free Software influences their decision. (This issue has caused some debate about terminology.)
With all that said, you don’t need the GPL to present the principles of a project. It’s certainly the easiest way to do so. The GPL is shorthand for a rich set of principles and ideals. But it’s shorthand for people who are already in the know. The ideas need to be reiterated and explained and reconsidered to stay relevant. I think a project might do more good with an explicit statement of principles. With that in place the licensing might not matter so much.
It occurred to me… Django is something like a dictatorship… or maybe an oligarchy. At first it seems like Pylons is the same… but no. Pylons is clearly feudal. I lord over Paste, WebOb, FormEncode. Mike Bayer lords over Mako and SQLAlchemy. Ben lords over Routes, Beaker, and Pylons.
I suppose in all cases there is a certain amount of democracy, because there are no serfs, and any individual is free to travel to any kingdom they like. Well, at least among the open source kingdoms. Without citizenship, and with no exclusiveness of ownership, with even property having largely disappeared, I suppose it’s inevitable that traditional metaphors of control and governance don’t really make sense.