Monday, October 20, 2008

Django and Ensembl - part III

Basic object manipulation. Once the models have validated, we can use the interactive shell (IPython accessed via "python manage.py shell") to play around with data access.

In [1]: from djensembl.djpyensembl.models import CoordSystem

In [2]: CoordSystem.objects.all()

Out[2]: [, , , , ]


5 objects are returned from that call, corresponding to the 5 coord_systems. Just like in the tutorial, that naming system is not very informative. So, I modified the CoordSystem model just like they did by adding the __unicode__ method as such:

def __unicode__(self):
return str(self.coord_system_id) + ' -> ' + self.name


Now I get this:

In [1]: from djensembl.djpyensembl.models import CoordSystem

In [2]: CoordSystem.objects.all()

Out[2]: [ chromosome>, supercontig>, contig>, clone>, chromosome>]


I doubt that's ultimately what I want but it works for now.

Some of the built-in generic behaviors don't work well with this db due to the particular use/misuse/nonuse of the 'id' column. In a Django-centric-developed app, you could issue statements like:

c = CoordSystem.objects.get(pk=1)

But, since the table doesn't have an id column which is the primary key, auto-incremented, this call fails, as does c = CoordSystem.objects.get(id=1), obviously.

What does work is c = CoordSystem.objects.get(coord_system_id=17). I presume this is one of the 'drawback' of using legacy db's where the table structure is not designed with Django's shortcuts in mind. No problemo.

Django and Ensembl - part II

One thing I'll need to resolve going forward is the problem of read-only. The intent of ensembl is to use it 'as is', i.e in read-only mode. When Django starts a project and syncs the apps, it writes the default admin and auth tables into my ensembl db. Do I need these here? Do I need them at all? Can I set them up elsewhere?

If I don't want them or don't want them there, how best to make the db read-only? At the level of the MySQL db itself? Trusting that others won't abuse write privileges (yeah right)? Or in the Django code. I'm leaning towards the first option...

Django and Ensembl

Django 1.0 is out and so far I like it a lot. I worked through the tutorials and other than my html 101 skills, all seems fairly straightforward.

I thought I would write an app to access my local install of one of the ensembl databases to see what Django can offer for legacy dbs. Note that Django has a short tutorial page on legacy dbs. I'm skipping over the part where I point the Django settings to my local db...

What we do is set up the basic project structure as in the tutorial, then added my app (djpyensembl). With no excuse for poor directory/project/app naming I set up:

~/django_ensembl
/djensembl
/__init__.py
/djpyensembl <-- this is my app
/manage.py
/settings.py
/urls.py


Now, to create the models from an existing db, we use the "inspectdb" option for manage.py. Running "python manage.py inspectdb > models.py" will output the model statements as figured out by Django looking at the ensembl tables. Moving the models.py file into the djpyensembl app directory puts the models in the right place.

What happens next caught me off guard. I figured I'd jsut start the Django dev server using "python manage.py runserver 8080". This is what I got:

Validating models...
Unhandled exception in thread started by
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/core/management/commands/runserver.py", line 48, in inner_run
self.validate(display_num_errors=True)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/core/management/base.py", line 126, in validate
raise CommandError("One or more models did not validate:\n%s" % error_text)
django.core.management.base.CommandError: One or more models did not validate:
djpyensembl.interpro: "id": You can't use "id" as a field name, because each model automatically gets an "id" field if none of the fields have primary_key=True. You need to either remove/rename your "id" field or add primary_key=True to a field.


Turns out the interpro table in ensembl is set up in a way that's not conducive to Django.

mysql> desc interpro;
+-------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| interpro_ac | varchar(40) | NO | PRI | | |
| id | varchar(40) | NO | PRI | | |
+-------------+-------------+------+-----+---------+-------+


mysql> select * from interpro limit 1;
+-------------+---------+
| interpro_ac | id |
+-------------+---------+
| IPR000001 | PF00051 |
+-------------+---------+


mysql> SHOW CREATE TABLE interpro;
+----------+---------------------------------------------+
| Table | Create Table |
+----------+---------------------------------------------+
interpro | CREATE TABLE `interpro` (
`interpro_ac` varchar(40) NOT NULL default '',
`id` varchar(40) NOT NULL default '',
UNIQUE KEY `interpro_ac` (`interpro_ac`,`id`),
KEY `id` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
+----------+---------------------------------------------+


There is no real primary key for the table and one of the fields is called 'id'. From what I can tell, this is the only table that uses a plain 'id' name as a column name and also from what I can tell, this seems a poor naming choice. Not knowing much about the nature of the entries in interpro and not planning on using them for this demo, I enter hack mode and immediately assign 'id' as the primary key.

mysql> ALTER TABLE interpro ADD PRIMARY KEY(id);
Query OK, 18220 rows affected (0.75 sec)
Records: 18220 Duplicates: 0 Warnings: 0


As a result of this, I need to update my Interpro model in ~/django_ensembl/djensembl/djpyensembl/models.py (the file created by inspectdb). Before the update, it looked like this:

class Interpro(models.Model):
interpro_ac = models.CharField(unique=True, max_length=120)
id = models.CharField(max_length=120)
class Meta:
db_table = u'interpro'


It now looks like:

class Interpro(models.Model):
interpro_ac = models.CharField(unique=True, max_length=120)
id = models.CharField(primary_key=True, max_length=120)
class Meta:
db_table = u'interpro'


Now when I run "python manage.py runserver 8080", the models will validate.

More on this later...

Monday, October 13, 2008

More on MySQL

In my last job, I witnessed a MySQL database excede the 4GB limit. It was a home-grown lims system keeping track of massive amounts of sequencing data. I don't imagine I'll see that again but if I had to keep an eye on things, it's always handy to remember the sweet little commands. The MySQL forum has a nice summary here. What I like is the use of the double '/ 1024' to get the size display in MB and the reminder that the 'database size' is calculated from multiple values. Here we see that the size is coming from both the table data and the indexes. I've never paid close attention to the size of the indexes, but it's good to remember that.

The actual amount of disk space a db takes up is queried by something like a 'du' in the appropriate directory. This is what threw us for a loop once with a problem in the InnoDB tables. I think it was a bonafide bug in MySQL at the time but the InnoDB tables were taking up a LOT of disk space... I'll have to remember what that was all about. It was a good lesson.

Tuesday, October 7, 2008

A bit of a case

Saw an interesting talk tonight on PyMeta which brought back memories of one of my old coworkers working on natural language processing. It also gave me some insight into how a compiler works. Neither of those points are related to what I wanted to jot down though. In showing us some code, the speaker (Tom Marsell) revealed a case-like statement. It struck me as sweet so I jotted it down...partially. When I got home, here is what I got to work:

def meth(operator, num1, num2):
return {
'+': lambda: num1 + num2,
'-': lambda: num1 - num2
}[operator]()

print meth('+', 2, 2)

I don't think that's what Tom had up there but it inspired me nonetheless. Thanks Tom.

Thursday, October 2, 2008

MySQL and MySQLdb

Yet another post on this stuff, for my own reference...

A common issue with MySQL on the Mac is an error like "ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock'". Help files suggest changing your /etc/my.cnf file which doesn't exist on a Mac. What I can do is use a ~/.my.cnf file instead. I can't remember now if I created this file or if it was created for me. I also can't remember if I put the '/var/lib/mysql/mysql.sock' path in for the mysql.sock file or not...

What MySQL did want was for the sock file path to be /tmp/mysql.sock, so I edited the section in my ~/.my.cnf file as such:

[client]
socket = /tmp/mysql.sock

That does the trick. I should note that ~/.my.cnf is the last of five locations MySQL will check for this configuration data. See here for more info.


Once I got that working again, I went to use mysqldb and got this error:

ImportError: No module named mysqldb

I think I had flushed out my mysqldb installs during a purge for a particularly nasty bug hunt. So, I go to reinstall it from a tarball. I use the classic "python setup.py install" and get this:

EnvironmentError: mysql_config not found

I've seen the errors about missing library files that you need to copy over or link to but this one was new... I'm thinking that the reason is that I was originally using my MySQL install directory for development work so it was on my path before. During the bug hunt, I had reset my path and eliminated the directory with the mysql_config file... Sure enough, that was it.

It turns out that I may have also deleted my mysql library files from /usr/local/mysql as that director no longer exists... May need to re-install MySQL...

...the next day after a bit of sleep...

Looks like the /usr/local/mysql wasn't the problem. I did a quick search on my puter for "mysqlclient_r" and lo and behold, there is an entry in the FAQ about this:

Build Errors
------------

ld: fatal: library -lmysqlclient_r: not found

mysqlclient_r is the thread-safe library. It's not available on
all platforms, or all installations, apparently. You'll need to
reconfigure site.cfg (in MySQLdb-1.2.1 and newer) to have
threadsafe = False.

As soon as I set threadsafe to False, it builds.