/var/log/rant: August 2011

2011/08/16

More Fun With SQL

I've mentioned this bit of code recently:

        INSERT INTO accession_analysis (
            accession_id ,  analysis_id ,
            reference_id ,  status ,
            status_text  ,  extra_parameters
            )
        SELECT
            accession_id ,
            ? , ? , ? , ? , ?, ?
        FROM accessions
        WHERE request_id = ?
        ORDER BY accession_id

Well, it's slightly different. Before we were just holding the current state ( waiting, working, success, failure ) and now we're holding text information, too, and you don't just want the current state, you want to be able to look back. So, in addition to accession_analysis , we're adding accession_analysis_status, which will have, so far, a unique id, an id for the AA it connects to, and then the the status information.

And the run now. I have to add a run number to the accession_analysis schema. I can do that, but that's not germane right now.

What is germane is how to store the status into accession_analysis_status at about the same time as it goes into accession_analysis, getting the id from the accession_analysis table. It would be far easier if I was using an iterative approach, but then I'm blasting the DB with many connections instead of just one.

An approach would be to find all the accession_analysis elements without a matching accession_analysis_status, and then inserting them into accession_analysis_status. Something like

INSERT INTO accession_analysis_status ( 
    aa_id , status , status_text 
    ) 
SELECT id , status ,status_text 
FROM accession_analysis 
WHERE there's no aa_id corresponding to the id in accession_analysis

But clearly, I don't know how to express this as SQL yet.

2011/08/15

Command Line Comments

Just read Life on the Command Line, wherein the author explains how he does everything everything everything on the command line these days

I'm largely sympathetic. Honestly. And I'm close. Look at my work Linux box and you'll see a good half-dozen terminal windows open. I'm a programmer whose two primary programming styles are command-line/batch/crontab and web programming, so I am much more comfortable with those styles and their explicit order than the GUI developers.

But I have sort of a guilty secret.

I spend most of my programming time using KomodoEdit.

I have a fairly custom .vimrc that allows me to do cool things in vi, but beyond crontab editing and other things that look for $EDITOR, I only use vi when I want to do lots of specific and repetitive find-replace stuff or need to do small changes, or when I'm using SSH to connect to a system and not using SSHFS to mount it. My heavy lifting for editors is done via KomodoEdit.

And that isn't it. Not by a long shot.

I have a significant amount of music. I used to use Rhythmbox as my media player because it allowed me to use and thus alias a command-line interface. On that system, I would have rhythmbox-client --play ,rhythmbox-client --pause , rhythmbox-client --previous and rhythmbox-client --next aliased to play , pause , prev and next .

I stopped for a few reasons. I now have a Windows machine that I use for testing web dev in different browsers, and, when I play my media, I play via Windows Media Player, in part because playing and queueing media that I have but didn't have in my media library made Rhythmbox unhappy. But, between Amazom, Google, Rdio, Pandora and Spotify (which I love), I hardly listen to that huge library anymore, because either I have much of that library up already or I'm streaming stuff I want but don't have. And those tools don't give me a command-line option.

It gets to what tool is powerful enough to allow me to do what I need to. With music, I need it on and going, and paused on occasion.

Don't get me wrong. I'm with him at least in part about mail sorting. Before web tools and Thunderbird, my preferred mail client was PINE, but there's functional reasons that I can't remember anymore that I moved the one remaining command-line mail account from the nmap-based sorting from mh/nmh to procmail. IIRC, I liked the syntax for mh better but I found that SpamAssassin worked better with procmail, and I desperately needed SpamAssassin. The sorting choices I get from both Gmail and Zimbra are both clunkier than a nice mh or procmail script. But I spend little of my time in email, so that shouldn't make much of a difference.

Anyway, interesting read.

2011/08/10

Fun with SQL

If you want to put one thing into a database, that's easy.

 INSERT INTO table_1 (field_1 , field_2 , field_3 ) VALUES ( 4 , 1 , 2 ) ;

I've been putting lots of things into table_1 that are in table_2, where there are n instances of table_1 in each table_2. This lead me to something like this.

 
my $sql ;
$sql = << 'SQL'
SELECT id FROM table_2 WHERE field_3 = ?
SQL
my $ptr = db_arrayref( $sql , $param->{ field_3 } ) ;

$sql = << 'SQL'
INSERT INTO table_1 ( 
    field_1 , field_2 , field_3 
    ) 
VALUES (
    ? , ? , ?
    )
SQL

map {
   my @vals ;
   push @vals , $param->{ a } ;
   push @vals , $param->{ b } ;
   push @vals , $_ ;
    }  @$ptr ;

That's OK for what it is, but what that ends up meaning is lots of small SQL commands sent to the server. SQL handles the commands well, but the opening and closing of network connections is just sort of sucky, so, that's not optimal. So, if you can just send one command, that's what you want. And in this case, the command is this:

 
INSERT INTO table_1 ( field_1 , field_2 , field_3 ) 
    SELECT 
    1 ,
    2 ,
    field_3 FROM table_2 WHERE field_3 = ?
    ;

Isn't SQL wonderful?

2011/08/01

Failure to Plan is Planning to Fail

I have a database table full of requests. When that table is filled, we also put together a wiki page for that request, because the the data we generate for each request can be fairly free-form. This means that we effectively divorce the state of the DB for a given request and the state of the wiki.

There are certain fields that are on the wiki page and not the database. Specifically, contact information for the person making the request. It made sense to me, at the time. I think it was based upon the data separation issue, or that we actually hold that in a profile, too, but I can't remember right now.

Now, my boss wants to be able to regenerate the wiki page for quick-and-easy refresh. Regenerate the DB stuff into a new wiki page and cut-and-paste the non-DB stuff from the old page to the new page. Easy-peasy. Except, I don't store all the data in the database.

So, now, I'm looking at adding columns to the table to handle said contact information and spidering the contact info off the wiki pages. If I had just put it into the database in the first place...

Cookie Notice