Wednesday, November 6. 2013

Threading an SSH Agent Through Screen

I posted this three years ago, buried in an unrelated post, but several people have asked me about it lately, so here it is in a dedicated post.

You should only have your SSH private key on hosts that are physically in your posession - laptop, desktop, etc. But you usually want to put those hosts to sleep or move them around, which means they can't keep live SSH connections going in a screen session. So you probably run screen on a server somewhere - a VPS, an admin host at work, or if you're like me, a server in your basement.

Now you have a new problem: when you first start the screen session, your SSH agent works fine in screen windows. But when you disconnect and reconnect, all of those screen windows are still looking for the old SSH agent, which no longer exists. So SSH connections in a reconnected screen session fail.

Well, there's a fix! The idea is to mirror the auth socket to a well-known name that is stable from one SSH connection to the next.

#! /bin/bash
# hard-link the SSH socket to one with a fixed name on the local
# machine, and set SSH_AUTH_SOCK to point to that fixed name.  Later
# invocations of this script will change the link, but the name will
# remain valid, allowing existing shells to continue to function.
setup_fixed_socket() {
  local old_socket="$SSH_AUTH_SOCK"
  local socket_dir="/tmp/$(uname -n)-$(id -u)"
  local socket_file=$socket_dir/agent

  # set up the directory and permissions
  [ -e $socket_dir ] || mkdir -p $socket_dir
  chmod 700 $socket_dir

  # remove an existing link
  [ -e $socket_file ] && rm $socket_file

  # hard-link in the new one
  ln $old_socket $socket_file

  # return the new socket
  echo $socket_file
}

# this variable will be exported to every shell opened by this
# invocation of screen -- even subsequent connections to it.  This
# variable may live for days or weeks.
export SSH_AUTH_SOCK=$(setup_fixed_socket)

# finally, fire up screen.  Try reattaching to a running
# session; otherwise start up a new one
screen -R -DD ${@} || screen

Wednesday, May 29. 2013

Oops, I partitioned my drive..

I did something colossally stupid yesterday. I was at the local hackerspace, hoping to cut some acrylic, and the wifi wasn't working. I was in a hurry and frustrated, so I pulled out a USB stick and tried to erase it. Suffice to say, the USB stick wasn't at /dev/sda. I wiped out the GPT on my laptop. Its disk is encrypted, so the buffer store kept things working for a while, then suddenly I had a blinking root prompt and .. nothing.

After the obligatory cold-sweat had passed, I quietly packed up and walked out. Here's the story of how I recovered from this, with the help of Jake Watkins (:dividehex).


Continue reading "Oops, I partitioned my drive.."

Monday, April 15. 2013

910 Days at Mozilla

As of today, I've been at Mozilla for 910 days. That's not a magic number, but this seemed like a good day to reflect on my time here.

I've had a chance to do a bunch of exciting things here:

  • Drink from the Mozilla Firehose
  • Manage build slaves in the release engineering environment
  • Build out a configuration management system with Puppet
  • Design systems to build out new hardware platforms and operating systems
  • Organize a move of systems and servers out of one datacenter and into another.
  • Build a web cluster
  • Build and maintain MySQL database clusters as an apprentice DBA
  • Learn Ruby and hack on Puppet
  • Build a dynamic hardware provisioning system (Mozpool)

You'll never be bored at Mozilla! There's never a shortage of work to do, with new projects coming all the time. The organization is structured so that it's easy to take on tasks that need doing, whether they're within your skill base or not. There's lots of room to learn, and everyone's happy to teach.

I work with an incredible group of people. Just within IT, we have a huge range of skills and capabilities for a relatively small team. People who know how to really solve problems, not the half-baked temporary solutions that you find elsewhere. As but one example, the datacenter operations team is building out and operating several world-class datacenters at the same time, and still managing to turn around our remote-hands requests in matters of hours. Our infrastructure team is full of people with deep experience in all aspects of system administration who are always willing to help solve a tricky problem. And on my own team, my co-workers all manage to work miracles far beyond the resources available.

Before Mozilla, I was at Zmanda, working on Amanda -- you know, the open-source backup application you remember from your early days? It's still around! Anyway, I took that job in part because it meant I could be paid to work on open-source software. I took full advantage of that opportunity, but the company was fundamentally a company - organized around sales, support, and the bottom line. My open-source concerns always played second fiddle, if that. Mozilla's different: the Mozilla Manifesto is what we do, and that's understood nowhere better than at the top of the organization. It can be a struggle sometimes, to see how the work I do supports the people who support the people who build the products that further the mission, but the connection is there and it's important. That keeps me going.

Here's to another 1000 successful days at Mozilla!

Friday, March 29. 2013

Locking SSH keys on sleep on Linux

I got a new laptop, a ThinkPad X1 Carbon, and I'm running Linux on it. So you're in for a series of posts describing the complex process I had to follow to accomplish simple things. Spoiler alert: 2013 is not the year of Linux on the desktop. It's not looking good for 2014 either.

I'm running Fedora 18. I tried Ubuntu 12.10, but Unity couldn't hold itself together long enough to actually do anything, so I started over with Fedora.

SSH Agent

Gnome runs a nice keychain app that acts like (but is not) OpenSSH's ssh-agent. The one obvious place it differs is that ssh-add -l will list keys even if they are "locked" (passphrase not supplied).

As long as you point the SSHAUTHSOCK variable to the right place, the agent works just fine for unlocking keys - it finds any private/public pairs in ~/.ssh, and prompts to unlock them once you issue an SSH command that needs a key. The problem is, it never re-locks the keys.

Locking

Personally, I use SSH constantly while my laptop is awake, so I don't want an arbitrary timeout. Instead, I'm careful to put it to sleep when I'm away from the keyboard. So I want a way to lock the key on sleep.

It turns out that pm-utils will run scripts in /etc/pm/sleep.d on sleep and wake. It runs them as root, unfortunately. I added the following in 01dustin-ssh-agent.sh:

#!/bin/sh

# drop keys from dustin's SSH agent

. "${PM_FUNCTIONS}"

lock()
{
        su - dustin /home/dustin/bin/ssh-lock
}

case "$1" in
        hibernate|suspend) lock ;;
        *) exit $NA ;;
esac

and then added the following in ~/bin/ssh-lock:

#!/bin/sh

# drop keys from the SSH agent, using the same trick as bin/startscreen to find
# that agent

base="/tmp"
[ -d /run/user ] && base="/run/user/$(id -u)"
socket_dir="$base/$(uname -n)-$(id -u)"
SSH_AUTH_SOCK=$socket_dir/agent ssh-add -D

See my post on tunneling ssh-agent into a screen session for the reference to bin/startscreen. I'm not sure how best to accomplish this without such a trick. I'll work on that and post again.

Thursday, October 25. 2012

Documentation for MDT's CustomSettings.ini

If you're looking for info on CustomSettings.ini, you're most likely to find questions answered with "try this script". You type it in, and if it works, great; if not, keep looking. It's well-nigh impossible to find actual documentation, and the programming-by-INI-sections design is not exactly intuitive.

It turns out there's some in the help docs for the MDT, but those are a .CHM file and Microsoft apparently doesn't post those online.

However, some helpful (Russian?) souls have done so. Behold: Microsoft® Deployment Toolkit 2012 Toolkit Reference.

Monday, September 17. 2012

Google Dependency..

I have a Google-branded Samsung Nexus, runny Jelly Bean. It's a decent phone, except that its GPS requires a small nuclear reactor to power it.

Today S and I drove back from a weekend trip to New Haven, CT. Google let us down pretty badly.

First, on the way out, I had used the navigation feature (which requires GPS, not just Google Location Service) for most of the day. I have a power adapter in the car, and the phone was plugged in the whole time. Still, after about 6 hours, the phone's battery was at about 10% -- having started at a full charge when we left. I forgot to plug it in where we spent the night, so it was dead in the morning. No problem -- I only need to navigate home, so I'll just plug it in!

And here we reach the first problem. This phone takes 4-6 minutes to start up. Which means either I key in my destination while already on the highway, or sit in the car for 4-6 minutes while my phone starts up. Bear in mind that during most of that startup time, it has a blank screen with no backlight, and sometimes startup crashes, so it's a bit of a psychological game to resist popping out the battery early.

Fine, so it starts up, I start the navigation application, and hit "Go Home". It plots a route, I take off, get on the highway, and the phone dies. At this point I do the math -- if my phone went from 100% to 10% in 6 hours while navigating, then it requires the combined power of both the battery and the adapter to run the navigation app. So I dutifully hand the phone to S to start back up and hit the "Go Home" button again. I immediately turn off the backlight and hope for the best. The navigation voice tells me to head South for 40 miles, and then I hear nothing.

This should have been a red flag, but I was too busy composing this blog entry in my head to do a little more arithmetic: Albany is not South of New Haven. A half-hour later, S checked her iPhone and pointed out we were heading the wrong direction.

I pulled over and checked my phone. Surprisingly, it was still on! However, it was navigating to Waters View, NY, which is not where we live. We live at Waters View Circle, Cohoes, NY. Google knows this. It's set in the navigation app. It turns out that Google takes the string you type in for your home address and hits the API equivalent of the "I'm feeling lucky" button, and you're off to the races. Perhaps not the races you were looking for. This is the same fundamental flaw as plagues Google Now. If you're in, say, Springfield, MA, but for whatever reason the location-aware "I'm feeling lucky" feels Springfield, IL is the more relevant search result, you get Springfield, IL's weather. And it helpfully just says "Springfield" on the card, so you don't know anything's wrong until it claims there's torrential rains on a clear day.

At this point we killed my phone and navigated the old-fashioned way - getting directions on the iPhone and following the relevant signs. It turns out we had a great drive along the Taconic State Parkway, which is a far sight more interesting than I-90, so that was OK. And we only lost an hour of drive-time.

Aside from the battery-life issues, which are not surprising from Samsung, but are surprising from a phone with "Google" on the back, there's an important point here: Google's approach to problems is to throw gargantuan amounts of data and CPU at them, and hope the answer's right. That's fine for search, but when it comes down to building a reliable personal device, people need something a little more deterministic. Google is increasingly heading toward personalized computing -- Google Glass being the ultimate expression -- and I think the company has a lot to learn before any of that will be more than an amusingly daft automaton.

Thursday, September 13. 2012

Building a partitioned log table

For a project at Mozilla that involves re-imaging hundreds of mobile devices, we want to gather logs in a database for failure analysis. Mobile devices fail all the time -- not sure if you knew that.

We'll probably end up with 1,000-10,000 log entries per day. We'd like to expire them on a relatively aggressive schedule -- no need for historical analysis at this level. So that means not only a lot of inserts, but a lot of deletes.

We're using MySQL as the database backend, and MySQL doesn't do well with deletes - it just marks the row as deleted, but doesn't reclaim the space, and in fact doesn't even remove the row from consideration in queries. So if you blindly insert and delete in a table, MySQL will eat disk space and get progressively slower.

One fix to this is to optimize the table periodically. However, this requires a full lock of the table for the duration of the optimize, which can be quite a while. We dont' want to cause a backup of production tasks while this is going on.

The other option is to partition the table. A partitioned table is basically a set of tables (partitions) with the same columns, organized to look like a single table. There's a partitioning function that determines in which partition a particular row belongs. There are a few advantages. Each partition is a fraction of the size of the whole table, so inserts are quicker (once the appropriate table is determined). The query engine can use "partition pruning" to ignore partitions that could not hold rows relevant to the query. Finally, dropping an entire partition at once is a very simple operation, and doesn't leave any garbage that needs to be optimized away.

For logs, we want to partition by time, in this case with one partition per day. Most of the "get the logs" queries will use a limited time range, invoking query pruning and allowing a quick response.

The tricky part is, the DB server does not automatically create and destroy partitions. We need to do that. It's pretty straightforward with stored procedures, though. Here's the resulting SQL to create the logs table:

DROP TABLE IF EXISTS logs;
CREATE TABLE logs (
    -- foreign key for the board
    board_id integer not null,
    ts timestamp not null,
    -- short string giving the origin of the message (syslog, api, etc.)
    source varchar(32) not null,
    -- the message itself
    message text not null,
    -- indices
    index board_id_idx (board_id),
    index ts_idx (ts)
);

--
-- automated log partition handling
--

DELIMITER $$

-- Procedure to initialize partitioning on the logs table
DROP PROCEDURE IF EXISTS init_log_partitions $$
CREATE PROCEDURE init_log_partitions(days_past INT, days_future INT)
BEGIN
    DECLARE newpart integer;
    SELECT UNIX_TIMESTAMP(NOW()) INTO newpart;
    SELECT newpart - (newpart % 86400) INTO newpart; -- round down to the previous whole day

    -- add partitions, with a single partition for the beginning of the current day, then
    -- let update_log_partitions take it from there
    SET @sql := CONCAT('ALTER TABLE logs PARTITION BY RANGE (UNIX_TIMESTAMP(ts)) ('
                        , 'PARTITION p'
                        , CAST(newpart as char(16))
                        , ' VALUES LESS THAN ('
                        , CAST(newpart as char(16))
                        , '));');
    PREPARE stmt FROM @sql;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;

    -- do an initial update to get things synchronized
    call update_log_partitions(days_past, days_future);
END $$

-- Procedure to delete old partitions and create new ones around the current date
DROP PROCEDURE IF EXISTS update_log_partitions $$
CREATE PROCEDURE update_log_partitions(days_past INT, days_future INT)
BEGIN
    DECLARE part integer;
    DECLARE newpart integer;
    DECLARE earliest integer;
    DECLARE latest integer;

    -- add new partitions; keep adding a partition for a new day until we reach latest
    SELECT UNIX_TIMESTAMP(NOW()) + 86400 * (days_future+1) INTO latest;
    createloop: LOOP
        -- Get the newest partition (PARTITION_DESCRIPTION is the number from VALUES LESS THAN)
        -- partitions are named similarly, with a 'p' prefix
        SELECT MAX(PARTITION_DESCRIPTION) INTO part
            FROM INFORMATION_SCHEMA.PARTITIONS
            WHERE TABLE_NAME='logs'
            AND TABLE_SCHEMA='imagingservice';
        IF part < latest THEN -- note part cannot be NULL, as there must be at least one partition
            SELECT part + 86400 INTO newpart;
            SET @sql := CONCAT('ALTER TABLE logs ADD PARTITION ( PARTITION p'
                                , CAST(newpart as char(16))
                                , ' VALUES LESS THAN ('
                                , CAST(newpart as char(16))
                                , '));');
            PREPARE stmt FROM @sql;
            EXECUTE stmt;
            DEALLOCATE PREPARE stmt;
        ELSE
            LEAVE createloop;
        END IF;
    END LOOP;

    -- now, deal with pruning old partitions; select the minimum partition
    -- and delete it if it's too old
    SELECT UNIX_TIMESTAMP(NOW()) - 86400 * (days_past+1) INTO earliest;
    purgeloop: LOOP
        -- Get the oldest partition
        SELECT MIN(PARTITION_DESCRIPTION) INTO part
            FROM INFORMATION_SCHEMA.PARTITIONS
            WHERE TABLE_NAME='logs'
            AND TABLE_SCHEMA='imagingservice';
        IF part < earliest THEN
            SET @sql := CONCAT('ALTER TABLE logs DROP PARTITION p'
                                , CAST(part as char(16))
                                , ';');
            PREPARE stmt FROM @sql;
            EXECUTE stmt;
            DEALLOCATE PREPARE stmt;
        ELSE
            LEAVE purgeloop;
        END IF;
    END LOOP;
END $$

DELIMITER ;

-- initialize the partitioning
CALL init_log_partitions(14, 1);

-- and then update every day (this can't be set up in init_log_partitions)
DROP EVENT IF EXISTS update_log_partitions;
CREATE EVENT update_log_partitions  ON SCHEDULE EVERY 1 day
DO CALL update_log_partitions(14, 1);

A few notes here. First, the table is created without any partitions. This is because I don't know a priori which partitions it should have, and it's easier to get code to figure that out than do it myself. That's what the initlogpartitions function does. The updatelogpartitions function looks at the current time and makes sure there are enough partitions for the future, and drops partitions too far in the past. Finally, a MySQL event is set up to update the partitions daily.

You'll need to enable the event scheduler globally to get this to run:

set global event_scheduler=on;

Tuesday, June 12. 2012

Mobile data while travelling in Great Britain

Three wireless was recommended for my trip here, but they were out of SIMs (which seems a common malady), so I dropped by the T-Mobile store. They set me up with a £5 SIM with unlimited data for 30 days, which seemed great. But read on.


Continue reading "Mobile data while travelling in Great Britain"

Thursday, June 7. 2012

Wireless Data in Belgium

In hopes this will be useful to others:

I have an unlocked AT&T phone (Samsung Galaxy Nexus), and needed a data plan in Belgium (Brussels, specifically). I really don't intend to text or talk, but data is important. Most of the news kiosks are happy to sell you a SIM, but with no data portion.

A fellow Mozillian, Ben Kero, recommended BASE wireless. There was a spot in the airport, but they were out of SIMs. I borrowed some wifi, looked up another location on base.be, and went there.

They have a 1-GB plan for 15€, called "surf & mail 15". What you'll get is a 15€ SIM card with 15€ credit on it, and a brochure giving instructions to send a text message to activate the 15€ plan using that credit. When I did so, I immediately received a text indicating I did not have sufficient credit, but the data seems to work.

Which is good, because the wifi in the apartment we're staying at is pretty poor.

Saturday, May 19. 2012

Trapped in Google?

Whenever I search in Aurora on my phone, I'm taken to a stripped-down version of the page with the header "this page adapted for your browser".

How do I fix this? I'd rather fix it with a Google preference, but barring that I expect that Firefox has a way for me to regain control of my online experience?

Wednesday, May 9. 2012

TIL about SSL certificate chains

I'm laying some SSL groundwork for a project to allow puppet clients to move between puppet servers without requiring a central CA, and without requiring each client to be aware of all masters. More on that in a future post.

Based on "Multiple Certificate Authorities", I would like to have certificate chains that look like this:

      +-puppetmaster1 CA--+-puppetmaster1 server cert
      |                   |
      |                   +-client 1 server cert
root--+                   :
      |                   
      +-puppetmaster2 CA--+-puppetmaster2 server cert
                          |
                          +-client 10 server cert
                          :

Then all of the certificate validation would be done with the root CA certificate as the trusted certificate. A server certificate signed by puppetmaster2's CA cert should then validate on puppetmaster1.

Building the certificates wasn't all that difficult - see my comment on the bug for the script. However, while making sure the verification worked, I ran into some non-obvious limitations of OpenSSL that are worth writing down.

I began by running "openssl verify":

[root@relabs-puptest1 ~]# openssl verify -verbose -CAfile puptest-certs/root-ca.crt -purpose sslclient puptest-certs/relabs08.build.mtv1.mozilla.com.crt 
puptest-certs/relabs08.build.mtv1.mozilla.com.crt: CN = relabs08.build.mtv1.mozilla.com, emailAddress = release@mozilla.com, O = "Mozilla, Inc.", OU = Release Engineering
error 20 at 0 depth lookup:unable to get local issuer certificate

the problem here is that the intermediate certificate is not available to the verification tool. Sources suggest to include it with the server cert, by concatention, with the server cert last:

cat puptest-certs/relabs-puptest1.build.mtv1.mozilla.com-ca.crt puptest-certs/relabs08.build.mtv1.mozilla.com.crt > relabs08-with-intermed.crt

However, after some struggle I learned that "openssl verify" does not recognize this format -- it will only look at the first certificate in the file (the intermediate), and if you don't look carefully you'll find that it successfully verifies the intermediate, not the server certificate! Sadly, sclient and ssever don't support it either. Apache httpd supports it with SSLCACertificatePath. This will feed the certificate chain to the client, and also allow httpd to verify client certificates without requiring the clients to support an intermediate.

The Apache config is

Listen 1443

<VirtualHost *:1443>
        ServerName relabs-puptest1.build.mtv1.mozilla.com
        SSLEngine on
        SSLProtocol -ALL +SSLv3 +TLSv1
        SSLCipherSuite ALL:!ADH:RC4+RSA:+HIGH:+MEDIUM:-LOW:-SSLv2:-EXP

        SSLCertificateFile /etc/httpd/relabs-puptest1.build.mtv1.mozilla.com.crt
        SSLCertificateKeyFile /etc/httpd/relabs-puptest1.build.mtv1.mozilla.com.key
        SSLCACertificatePath /etc/httpd/ca-path

        # If Apache complains about invalid signatures on the CRL, you can try disabling
        # CRL checking by commenting the next line, but this is not recommended.
        #SSLCARevocationFile     /etc/puppet/ssl/ca/ca_crl.pem
        SSLVerifyClient require
        SSLVerifyDepth  2

</VirtualHost>

While you're getting that set up, you're probably wondering where to get this fancy "c_rehash" utility. Don't bother. It's about as simple as:

for i in *.crt; do
        h=`openssl x509 -hash -noout -in $i`
        rm -f $h.0
        ln -s $i $h.0
done

As a side-note, the results of verification by sclient and sserver are not very obvious. Look for the overall error message near the bottom of the output. Here's the result of a client verification once I had everything put together, with some long uselessness elided:

[root@relabs-puptest1 ~]# openssl s_client -verify 2 -CAfile puptest-certs/root-ca.crt -cert puptest-certs/relabs08.build.mtv1.mozilla.com.crt -key puptest-certs/relabs08.build.mtv1.mozilla.com.key -pass pass:clientpass -connect localhost:1443
verify depth is 2
CONNECTED(00000003)
depth=2 CN = PuppetAgain Root CA, emailAddress = release@mozilla.com, OU = Release Engineering, O = "Mozilla, Inc."
verify return:1
depth=1 CN = CA on relabs-puptest1.build.mtv1.mozilla.com, emailAddress = release@mozilla.com, O = "Mozilla, Inc.", OU = Release Engineering
verify return:1
depth=0 CN = relabs-puptest1.build.mtv1.mozilla.com, emailAddress = release@mozilla.com, O = "Mozilla, Inc.", OU = Release Engineering
verify return:1
---
Certificate chain
 0 s:/CN=relabs-puptest1.build.mtv1.mozilla.com/emailAddress=release@mozilla.com/O=Mozilla, Inc./OU=Release Engineering
   i:/CN=CA on relabs-puptest1.build.mtv1.mozilla.com/emailAddress=release@mozilla.com/O=Mozilla, Inc./OU=Release Engineering
 1 s:/CN=CA on relabs-puptest1.build.mtv1.mozilla.com/emailAddress=release@mozilla.com/O=Mozilla, Inc./OU=Release Engineering
   i:/CN=PuppetAgain Root CA/emailAddress=release@mozilla.com/OU=Release Engineering/O=Mozilla, Inc.
 2 s:/CN=PuppetAgain Root CA/emailAddress=release@mozilla.com/OU=Release Engineering/O=Mozilla, Inc.
   i:/CN=PuppetAgain Root CA/emailAddress=release@mozilla.com/OU=Release Engineering/O=Mozilla, Inc.
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIEeTCCA2GgAwIBAgIBATANBgkqhkiG9w0BAQUFADCBkTE1MDMGA1UEAxMsQ0Eg
...
H90rZMVxsVyPHjjfXkeeFcSWyUnV/z3G9osrI9I9SaQ1o9bDc7ZheyHbWbhn
-----END CERTIFICATE-----
subject=/CN=relabs-puptest1.build.mtv1.mozilla.com/emailAddress=release@mozilla.com/O=Mozilla, Inc./OU=Release Engineering
issuer=/CN=CA on relabs-puptest1.build.mtv1.mozilla.com/emailAddress=release@mozilla.com/O=Mozilla, Inc./OU=Release Engineering
---
Acceptable client certificate CA names
/CN=PuppetAgain Root CA/emailAddress=release@mozilla.com/OU=Release Engineering/O=Mozilla, Inc.
/CN=CA on relabs-puptest1.build.mtv1.mozilla.com/emailAddress=release@mozilla.com/O=Mozilla, Inc./OU=Release Engineering
/CN=CA on relabs-puptest2.build.mtv1.mozilla.com/emailAddress=release@mozilla.com/O=Mozilla, Inc./OU=Release Engineering
---
SSL handshake has read 5379 bytes and written 1716 bytes
---
---
New, TLSv1/SSLv3, Cipher is DHE-RSA-AES256-SHA
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: zlib compression
Expansion: zlib compression
SSL-Session:
    Protocol  : TLSv1
    Cipher    : DHE-RSA-AES256-SHA
    Session-ID: E30634D9CFCC2FA327282DA813BB550C24ACDF18194E5F13C4981AA55914B5F0
    Session-ID-ctx: 
    Master-Key: 013EB09B066418694D36D74B414BBA42E52DBF0066314B60FC7A74662A60934282B6C37C5C82026F70287E60F4FF9472
    Key-Arg   : None
    Krb5 Principal: None
    PSK identity: None
    PSK identity hint: None
    TLS session ticket:
    0000 - 82 5f 17 72 97 bd f3 1e-ec 24 de 69 ab 1e cd 1d   ._.r.....$.i....
    ....
    0520 - 40 05 b3 27 20 00 8d ce-93 a9 48 81 8f 0c 16 5b   @..' .....H....[

    Compression: 1 (zlib compression)
    Start Time: 1336582165
    Timeout   : 300 (sec)
    Verify return code: 0 (ok)
---

note the "Verify return code" at the bottom.

By way of demonstration that the server is actually checking those certs:

[root@relabs-puptest1 ~]# openssl s_client -verify 2 -CAfile puptest-certs/root-ca.crt -cert bogus.crt -key bogus.key -pass pass:boguspass -connect localhost:1443
verify depth is 2
CONNECTED(00000003)
depth=2 CN = PuppetAgain Root CA, emailAddress = release@mozilla.com, OU = Release Engineering, O = "Mozilla, Inc."
verify return:1
depth=1 CN = CA on relabs-puptest1.build.mtv1.mozilla.com, emailAddress = release@mozilla.com, O = "Mozilla, Inc.", OU = Release Engineering
verify return:1
depth=0 CN = relabs-puptest1.build.mtv1.mozilla.com, emailAddress = release@mozilla.com, O = "Mozilla, Inc.", OU = Release Engineering
verify return:1
140283463366472:error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca:s3_pkt.c:1193:SSL alert number 48
140283463366472:error:140790E5:SSL routines:SSL23_WRITE:ssl handshake failure:s23_lib.c:184:
---
Certificate chain
 0 s:/CN=relabs-puptest1.build.mtv1.mozilla.com/emailAddress=release@mozilla.com/O=Mozilla, Inc./OU=Release Engineering
   i:/CN=CA on relabs-puptest1.build.mtv1.mozilla.com/emailAddress=release@mozilla.com/O=Mozilla, Inc./OU=Release Engineering
 1 s:/CN=CA on relabs-puptest1.build.mtv1.mozilla.com/emailAddress=release@mozilla.com/O=Mozilla, Inc./OU=Release Engineering
   i:/CN=PuppetAgain Root CA/emailAddress=release@mozilla.com/OU=Release Engineering/O=Mozilla, Inc.
 2 s:/CN=PuppetAgain Root CA/emailAddress=release@mozilla.com/OU=Release Engineering/O=Mozilla, Inc.
   i:/CN=PuppetAgain Root CA/emailAddress=release@mozilla.com/OU=Release Engineering/O=Mozilla, Inc.
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIEeTCCA2GgAwIBAgIBATANBgkqhkiG9w0BAQUFADCBkTE1MDMGA1UEAxMsQ0Eg
...
H90rZMVxsVyPHjjfXkeeFcSWyUnV/z3G9osrI9I9SaQ1o9bDc7ZheyHbWbhn
-----END CERTIFICATE-----
subject=/CN=relabs-puptest1.build.mtv1.mozilla.com/emailAddress=release@mozilla.com/O=Mozilla, Inc./OU=Release Engineering
issuer=/CN=CA on relabs-puptest1.build.mtv1.mozilla.com/emailAddress=release@mozilla.com/O=Mozilla, Inc./OU=Release Engineering
---
Acceptable client certificate CA names
/CN=PuppetAgain Root CA/emailAddress=release@mozilla.com/OU=Release Engineering/O=Mozilla, Inc.
/CN=CA on relabs-puptest1.build.mtv1.mozilla.com/emailAddress=release@mozilla.com/O=Mozilla, Inc./OU=Release Engineering
/CN=CA on relabs-puptest2.build.mtv1.mozilla.com/emailAddress=release@mozilla.com/O=Mozilla, Inc./OU=Release Engineering
---
SSL handshake has read 3984 bytes and written 997 bytes
---
New, TLSv1/SSLv3, Cipher is DHE-RSA-AES256-SHA
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: zlib compression
Expansion: NONE
SSL-Session:
    Protocol  : TLSv1
    Cipher    : DHE-RSA-AES256-SHA
    Session-ID: 
    Session-ID-ctx: 
    Master-Key: 07E536F1C69A856857EA95DFD821BD6BBD499B5710642F9396D9525637EAD17C03064D5115B3D7F517EDE189E7AF40F8
    Key-Arg   : None
    Krb5 Principal: None
    PSK identity: None
    PSK identity hint: None
    Compression: 1 (zlib compression)
    Start Time: 1336582289    Timeout   : 300 (sec)
    Verify return code: 0 (ok)
---

Note the handshake failures near the top, where httpd closed the connection on the client.

The next step is to make CRLs work properly, since Puppet uses them extensively.

Saturday, March 17. 2012

Setting up a buildslave instance remotely on OS X Lion

Byrce Lelbach has generously offered access to an OS X system as a metabuildbot slave. As I went about setting it up today, the process was not obvious, so I thought I'd share. This was interesting mostly because I only have SSH access to the host, so I cannot download things from the Apple Store or do any of the fancy point-and-click stuff that would make this easier.

First, I needed to get XCode installed. Note that the (much quicker to download) XCode command-line tools are not sufficient to build everything in MacPorts -- in particular, they do not support building zlib, which is required for git-core.

I got my hands on a copy of "Install XCode.app", and:

host:Downloads buildbot$ cd Install\ Xcode.app/Contents/
host:Contents buildbot$ sudo installer -package Resources/Xcode.mpkg -target /
Password:
installer: Package name is Xcode
installer: Upgrading at base path /
installer: The upgrade was successful.

Once this was done, I installed MacPorts:

host:Downloads buildbot$ hdiutil mount MacPorts-2.0.4-10.7-Lion.dmg
Checksumming Driver Descriptor Map (DDM : 0)…
     Driver Descriptor Map (DDM : 0): verified   CRC32 $A913D2D8
Checksumming Apple (Apple_partition_map : 1)…
....
     Apple (Apple_partition_map : 1): verified   CRC32 $A1DF5DC1
Checksumming disk image (Apple_HFS : 2)…
...... (...) .....
          disk image (Apple_HFS : 2): verified   CRC32 $5A3E74A0
Checksumming  (Apple_Free : 3)…
                    (Apple_Free : 3): verified   CRC32 $00000000
verified   CRC32 $D9641854
/dev/disk2              Apple_partition_scheme
/dev/disk2s1            Apple_partition_map
/dev/disk2s2            Apple_HFS                       /Volumes/MacPorts-2.0.4
host:Downloads buildbot$ pushd /Volumes/MacPorts-2.0.4/
/Volumes/MacPorts-2.0.4 ~/Downloads
host:MacPorts-2.0.4 buildbot$ sudo installer -package MacPorts-2.0.4.pkg/ -target /
Password:
installer: Package name is MacPorts-2.0.4
installer: Installing at base path /
installer: The install was successful.
host:MacPorts-2.0.4 buildbot$ popd
host:Downloads buildbot$ hdiutil unmount /Volumes/MacPorts-2.0.4

and we're off to the races.

I added /opt/local/bin to my path as suggested, and then followed the normal MacPorts setup process.

Finishing up the buildslave install required installing Git (which manages to pull in unreasonable amounts of other stuff!)

host:Contents buildbot$ sudo  /opt/local/bin/port install git-core -credential_osxkeychain-doc-pcre-python27

which is required for the source steps, then creating a virtualenv to install buildbot-slave:

host:~ buildbot$ virtualenv sandbox
New python executable in sandbox/bin/python
Installing setuptools............done.
Installing pip...............done.
host:~ buildbot$ source sandbox/bin/activate
(sandbox)host:~ buildbot$ pip install buildbot-slave
...

and then create and start a slave:

(sandbox)host:~ buildbot$ buildslave create-slave buildslave buildbot.buildbot.net:9989 HOSTNAME PASS
...
(sandbox)host:~ buildbot$ buildslave start buildslave
...

I then followed the helpful advice here to set up a plist that will start the daemon on boot.

Thursday, November 17. 2011

IT and Community

Mozilla's IT team is pivoting to a more community-focused approach. Our director of IT, mrz, has been writing extensively about it over the last few weeks.

As you can imagine, the difficult part of this is to balance security with accessibility. We'd like to be open, but we can't give the keys to the kingdom out to anyone who promises to help. The approach we're taking is to treat volunteers as we would part-time employees - post positions, interview, and then supervise to gain trust. This is a fairly common model, actually, for any organization with volunteers and a need for security. Youth programs, for example, generally do an interview and background check with new volunteers, and those volunteers will be paired with senior volunteers or staff for a while.

However, it's a bit cumbersome, both for Mozilla and for potential volunteers. We must design entire positions - ongoing tasks or roles that a volunteer can work on for an extended period of time - and then select a limited number of volunteers to fill those roles. For potential volunteers, an application and interview can mean a long time (weeks?) before they get to do anything hands-on. It also carries the risk that we'd have to turn a qualified volunteer away due to lack of suitable positions.

So what to do?

We need a more fluid way of interacting with potential contributors. Since our bug database is public, we can begin by simply tagging a few bugs that are appropriate for newcomers -- things that don't require sensitive access and are well-encapsulated so they can be completed without extensive knowledge of Mozilla's infrastructure.

Here's the list.

It's a bit short right now. There are a few things that may help:

  • We can get better about identifying appropriate tasks and projects and making bugs out of them.
  • We can identify a means of giving limited or sandboxed access to a new volunteer.
  • Consumers of Mozilla's IT resources can begin tagging bugs, where Mozilla can provide the resources and volunteers can do the heavy lifting - got any ideas?

Friday, September 2. 2011

Subscribe to a google group with a different address?

Google Groups is one place where, IMHO, Google pushes its hegemony too far, making it difficult to use. I wanted to subscribe to puppet-users with my Mozilla address, but since I have a Google account, Groups assumes I want to subscribe with that address. No!

I found the fix with a bit of Googling (some irony there). It involves editing a URL:

http://groups.google.com/group/puppet-users/boxsubscribe?email=email@domain.com

where you'd substitute the name of the group you want for puppet-users and add your email at the end.

Friday, May 20. 2011

Nagios NSCA from Python

I've been working on improving the monitoring of the build slaves at Mozilla. As part of this project, I needed to be able to submit passive check results to the Nagios servers via NSCA during system startup. I'm doing this from a Python script that needs to run on a wide array of systems using whatever random Python is available. We run some oddball stuff, so the common denominator is Python 2.4.

It turns out that there's no Python NSCA library, although there is Net::Nsca in Perl. So, I wrote one, and put it on github: https://github.com/djmitche/pynsca.

At the moment, this only knows XOR, and only does service checks. That's all I need, but hopefully it can be easily expanded to cover other purposes. The one thing I want to avoid is adding mandatory requirements -- this should work, at least in plain-text and XOR modes, on a plain-vanilla Python installation.

By the way, the startup script I'm working on is runslave.py, which includes a modified copy of pynsca and does a number of other housekeeping jobs as well. More on that in a subsequent post.

Notice

The postings on this site are my own and don't necessarily represent the opinions of my employer, or anyone else.