08 July 2008

Operations

Spent most of the morning looking at web and firewall configs. For some reason nobody can see the stats hosted on the firewall. Reason is that firewall rules restrict the IP's that are allowed to look at port 80. I could open it up to the world and require a user-auth, but that's more on-going user admin.

Also took a look at why the firewall rules didn't run when Verizon fucked up their datacentre a couple of weeks ago... no clue. Everything looks right on the server and firewall. And we can't shut operations down while we investigate and experiment further, so I guess that's where it stops for now.

21 May 2008

Task switching overhead

Here's the problem with all the firefighting-mode stuff we've been forced into...

I have a task unfinished. Unfortunately, unbeknown to me, I also left it in a broken state for the live server due to a library compatibility issue. Now its caught up with me -- a month later -- and its taking me hours to pick up the threads again so that I can get the app into a running state so that at least the unbroken bits can still be used for some critical management stats...


Hours I would not have to waste if I had just been allowed to finish the fucking job in the first place!

06 May 2008

priorities

So yesterday I start my day (Oh Joy! Oh Joy!) with an email from R completely failing to detail some stuff wrong with the feeds I set up for him last Wednesday (looong weekend in between -- I studiously ignored all SMSs, having told everyone that I'm not available for the weekend.)

At the same time D wants me to help with the NCD config. My reaction: "Please consult with the business people and prioritise which one you want me to work on." Nett result: I'm working on the feed aggregator.

These people seem to have an active abhorrence of structure and well-defined lines of communication. I wonder how the hell they expect to do business if/when the company grows beyond a dozen people...

(Of course the answer is that the company is unlikely to survive that long.)

30 April 2008

more proof that time/effort estimation is futile

Helping out with the installation of NCD -- not-very-well documented 3rd-party software. So trying to get at least the reference backend process running.... after a day we discover -- buried in the depths of the Developer docs (not in the Installation docs where you would expect to find it) -- in the finest of fine print -- the app requires several Hibernate (ugh) libraries.

OK -- off to fetch them. The Hibernate downloads all come from Sourceforge. And there's an outage -- it looks like an international connectivity issue, since I can't reach my own offshore servers, either. So we're screwed until the connectivity comes back.

How do you predict that delay?

I'll say it again. "I'm not making estimates of how long development is going to take ever again."

making license-plates

Spent the morning making license plates. (Cryptonomicon for the reference.) Configured 140 different feeds in the feed-db because I haven't had the time to write admin interfaces for the aggregator.

I wish people would realise that, when we're forced to do "quick and dirty" systems and then "quickly onto the next crisis" we're hurting our productivity terribly. Not only do we have the task-switching overhead of jumping from one job to the next, but the we're faced with something like this morning's issue, and it takes a developer hours and hours to do something that can and should be done by a secretary in half the time.

At the moment we're sitting with probably 8 different systems all in a half-arsed state -- unfinished, partially broken, poorly configured, putting who-knows-what demands on the server because we're measuring nothing. Someday soon its all going to unravel on us, and then the business will simply go under because we're facing weeks -- not hours -- of systems outage, meaning weeks of no revenue.

Were this my business, the stress of living on the edge like this would kill me.

24 April 2008

basic crud

basic crud is working for the CMS. Lots of boring to and fro over whether we're putting a userid or an actual user into the session. Most of my stuff wanted a userid, but the surrounding framework that I'm integrating into wants to put a full user there.

oh well... fixed now.


The look and feel sucks. My pages were designed as full-blown pages. Now they're inside an iframe with loads of other shit around. ecch. It's too hot. job for tomorrow.

of course its a bug

when you ignore the contract YOU wrote -- all by yourself -- for a method, and then forget that null is a legitimate return value. "and then there's a NPE".

unclean data in the live db

not so unusual, but... I'm trying to load users from the user table (via eodsql) and the "addUserID" and "modUserID" columns are currently filled in with zeroes. So, of course, UUID.fromString pukes.

What now?

creating test data

handraulically creating test db data for the cms... ugh

postgres server not starting on test

[11:30:33] <Di> please can you add starting the db to the startup script on test
[11:30:43] <mikro2nd> it is already
[11:30:46] <Di> after every power failure I have to manually start the db
[11:30:48] <mikro2nd> ??
[11:30:52] <mikro2nd> i'll check
[11:30:55] <Di> thx
[11:32:52] <mikro2nd> chkconfig says
postgresql 0:off 1:off 2:off 3:on 4:on 5:on 6:off
[11:33:11] <mikro2nd> and since our default runlevel is 3 it should be getting started unless there's some other problem
[11:33:12] <Di> what does that mean?
[11:33:41] <mikro2nd> says: postgres os OFF at runlevels 0126 and ON at runlevels 345
[11:33:42] <Di> well I have just started it this morning and start it every time there is a power out
[11:33:51] <Di> ok
[11:34:07] <mikro2nd> will investigate further
[11:34:19] <Di> something is odd - not serious, we can live with it, just know that it is the first thing to check if something does not work
[11:35:14] <mikro2nd> when was the last power outage/reboot? (roughly)
[11:35:36] <Di> came on this morning at 10 past 10
[11:35:43] <mikro2nd> ok

(checkicty checkity check...)

[11:43:57] <mikro2nd> weird... i can't see anything wrong -- rc3.d links are correct, ownerships and permissions on /etc/init.d/postgres and /var/lib/postgres, but I can see that the the startup log is timstamped 11:43 rather than the 10:08 that the system came up... very weird!
[11:44:24] <Di> dont waste any more time on it - not worth it, just as long as we know
[11:44:29] <Di> but it is odd
[11:44:34] <mikro2nd> ok

clientadmin finally into test

Finally clientadmin is deployed to test. Of course there's no data in the db, yet, so -- as expected -- the test page fails. That may be the best thing about the unit-testing philosophy -- its a good feeling when a test fails as expected!