I just had the weirdest experience. I'm sitting in the speaker's lounge at the conference. I happened to glance at the iBook next to me and saw the blog entry that I just wrote up on the screen.
That doesn't happen every day.
(Let's see if it happens again when I post this entry.)
I've been deleting more and more spam comments from my blog recently. I'm this close to hacking MT to call SpamAssassin before allowing a comment to actually post.
Am I the first? Has someone else already done the work? Google hasn't located anything relevant for me. And I figure it'll be a 5-10 minute hack once I get back into the MT codebase again.
Phil is talking about Ruby. Again, semi-realtime notes on Ruby.
Ruby is roughly 10 years old now. Matz liked Perl's text processing but didn't think that Python was OO enough. It's more of a Perl/SmallTalk blend. Classes, methods, objects, exceptions, message passing, iterators, closures, garbage collection, etc. And it's multi-platform, of course.
Back in 2000, Phil used a lot of Perl but found OO Perl tedious.
Why learn Ruby? It has a similar syntax but is different enough in some places to make you think differently. Strings, hashes, arrays, etc. Ruby can use any object as a key to a hash. Regexes, here-docs, etc.
@ means instance variable inside a class, not an array. The $ denotes global scope variable. @@ denotes a class variable. Semi-colons are optional at the end of line. Parens are optional in method calls.
False and nil are false. But 0 and '0' are true. Everything is an object.
Smaller community for Ruby, but that's okay.
Lots of interesting on-screen examples that I can't reproduce easily, so I'll just watch.
Bruce is discussing various replication schemes for PostgreSQL database servers. These are nearly real-time notes, so they're a bit sketchy.
Each use has slightly different requirements. Fast or slow line, on-line 24/7 or only part time, etc.
Given a particular usage, which method do you use?
Talk is half done, but I'm off to a Ruby talk now...
I'm attending the session about MySQL and Pogo Linux. They've partnered to produce a turnkey rack-mounted MySQL server that comes with a MySQL support agreement built into the price. (press release)
Erik Logan is speaking with David Axmark.
Pogo specializes in hardware. They add a tuned RedHat and MySQL to create the product. They're centralizing support for the hardware, OS, and MySQL. If it's a MySQL issue, the MySQL folks will handle it.
Their product is called the DataWare Database Appliance. Dual Intel Xeons w/Hyper Threading, 15K RPM SCSI drives, battery-backed RAID-10 with write cache. Total RAM capacity is 16GB.
Not sure about the default filesystem. I'm guessing ext3 or ReiserFS.
There's a liberal warranty that allows for vendor approved upgrades and enhancement. They're talking about integrated backup options as well.
The specs are still in flux too. Not sure how much storage will be on by default. 36GB or 72GB drives? Probably 6x36GB to start.
I slept in today. But others did not. Phil Windley and Michael Radwin both blogged about Tim's keynote. Sounds like a lot of what Tim's been saying in recent years.
Ray and I met up with some of the MySQL folks for dinner and headed to a local Spanish restaurant. Good stuff. I have a picture of Monty and David that I need to extract from my camera.
After that, we came back to watch the State of the Union presentations: Larry, Guido, Shane, and David/Monty. Interesting tidbits: Guido is leaving Zope. Larry will have to get "real job" again someday.
Finally, we spent some time at the always excellent ActiveState party. I have a pic or two from that too.
Now (Wednesday morning), I'm catching up on e-mail and working on my Friday presentations. Lunch just started. It's sponsored by Microsoft. How odd is that?
I don't get this one at all, but it sure is funny.
From: "" <request_81@chiche.com>
To: <jeremy@zawodny.com>
Subject: Dimensional Warp Generator Needed ojljhbjgorq
Date: Wed, 09 Jul 03 11:01:21 GMTGreetings,
We need a vendor who can offer immediate supply.
I'm offering $5,000 US dollars just for referring a vender which is (Actually RELIABLE in providing the below equipment) Contact details of vendor required, including name and phone #. If they turn out to be reliable in supplying the below equipment I'll immediately pay you $5,000. We prefer to work with vendor in the Boston/New York area.
1. The mind warper generation 4 Dimensional Warp Generator # 52 4350a series wrist watch with z80 or better memory adapter. If in stock the AMD Dimensional Warp Generator module containing the GRC79 induction motor, two I80200 warp stabilizers, 256GB of SRAM, and two Analog Devices isolinear modules, This unit also has a menu driven GUI accessible on the front panel XID display. All in 1 units would be great if reliable models are available
2. The special 23200 or Acme 5X24 series time transducing capacitor with built in temporal displacement. Needed with complete jumper/auxiliary system
3. A reliable crystal Ionizor with unlimited memory backup.
4. I will also pay for Schematics, layouts, and designs directly from the manufature which can be used to build this equipment from readily available parts.
If your vendor turns out to be reliable, I owe you $5,000.
Email his details to me at: info@federalfundingprogram.com
Please do not reply directly back to this email as it will only be bounced back to you.
Too bad I can't reply. I have a great lead for them.
David is presenting MySQL's new features, covering 4.0, 4.1, and a bit of 5.0. I'm at this presentation mostly because I've been using the new features that I no longer consider them new. That means I have trouble enumerating the new features when people ask. So I'm hoping this talk provides a nice summary that I can reuse.
SAP will use MySQL as their default database in a few years. MySQL is providing ideas for implementation, advice, access to developers, and so on.
Lots of talks about crash me and benchmarking various databases.
The MySQL folks remind us that software patents are evil.
I'm in John Ashenfelter's Data Warehouses talk this morning. He's an excellent presenter who really knows his material and is completely sold on MySQL. At least one person in the audience has already extolled the use of MySQL in data warehousing applications.
I'm taking notes real-time, so this will be a bit disjointed but that's life.
Idea for O'Reilly book: Data Warehousing in a Nutshell.
The talk starts with a story about why MySQL is the most cost-effective data warehousing solution available. Compared to Microsoft SQL Server (the cheapest closed-source solution), MySQL is a big savings.
Data Warehouse vs. Data Mart. Plan for a warehouse, but build a mart. There are a lot of things you might need, but there's no need to build all of it until you need it. Data marts lock together to become a warehouse.
Book recommendation: The Data Warehouse Toolkit.
Data warehouse: focused on business processes, using standardized granular facts. It's a collection of standardized marts.
Data mart: focused on one narrow business, includes lightly summarized data. It's a component of a data warehouse.
Metadata capture. Need to get terms and definitions correct and agreed upon in advance. Policies and company practices factor into the decision making. How does your business really quantify things? You need to ask users what their business needs are. Sometimes this involves going quite high in the organization.
Every time you hear "by", think about dimensions. They're wide and flat tables (compared to fact tables). Lots of redundant data. Make sure the measurement units are the same. This is hard in multi-national companies. Which day (time zones)? Sizes and volumes of items, etc. How will they be formatted? What enumerations (possible values) will exist? M/F, 0/1, Y/N, Small/Medium/Large/X-Large.
Calendar dimension example: date_id, date_value, description, month, day, year, quarter, is_weekday, day_of_week, fiscal_month, fiscal_quarter, astrological_sign, etc. Lots of duplication (imagine one record for each day of the year). You could add weather info, abnormal business closes, etc.
All about keys. Many keys, few facts. Very deep (tall) and narrow. It's best not to store calculated values because you may need to recalculate someday (margin, for example). You can calculate on the fly (in the query) or in code that's pulling the data. Use facts that the business users understand.
Don't use anything meaningful for keys. Never. Ever. Meaningful things change when companies merge, change, etc. Just invent numbers that are meaningful to the database only.
There's a central fact, many dimensions (the arms), and no other tables. Don't "snowflake" or over-normalize by hanging new tables off of the dimension tables.
When building the schema, decide on the grain. What's the smallest bit of data anyone will ask for? One day? One hour? One week?
Sales from a web-based meal order system (Vmeals.com). Many clients/customers, caterer/restaurants, delivery locations, etc. The database is relatively small now (300MB or so).
Lots of data to track (on the white board). The most important bit will likely be order items. Starting simple, our facts are orders and customer service metrics. Dimensions are calendar, customers, products, promotions, and so on. Create a bus design.
We'll use sales for this example.
Pick the lowest possible grain that makes sense. For this example, it's orders or more specifically ordered items. Order will be the second fact table. Dimensions: calendar (order date, deliver date), product (menu items), customer, delivery location, provider (vendor), licensee (market).
Need to pull data from the on-line system to populate the warehouse. Some may came from other places too: market information system (MS Access), promotion engine, etc. Larger companies will have many more.
The order fact table will be primarily built from line items from orders. Think about additive, non-additive, and semi-additive values. You generally want additive data. Store the data needed to compute things, not the resultant values. Snapshot values (daily bank balance) not additive.
Degenerate dimensions have no corresponding dimension table. Invoice or oder number are common examples. They're only used in groups or rollups, typically.
Role-playing dimensions are used over and over. Dates are good example: payment date, order date, delivery date. In some systems you'd use views for this. They'd all be views over the underlying calendar table. You can use MySQL Merge tables to work around the lack of views. Or you can create several one-table merge tables.
Slowly changing dimensions. Fixed data: just update. Changed data: add a new row (like a new address). Fundamental schema change: add a new column, keep old column data around.
ETL process: extract, transform, load. Go to the source of the data. Extraction can be tricky if you have lots of data and little time (24x7 system). If the source data has date stamps, you can perform incremental dumps. With some systems you can use a row checksum and a computed index. Transformation is about making the data match the warehouse's metadata standards. Perl to the rescue! (Or maybe using an intermediate database.)
Microsoft DTS is a good option in the Windows world. It comes with SQL Server and is scripted via a scripting host language. Lots of expensive commercial tools to do this too.
When dumping from other systems, watch out for blobs. They probably don't belong anyway. Make sure that stuff comes out as quoted ASCII rather than some internal representative that MySQL won't grok.
To load, MySQL's LOAD DATA INFILE works quite well for this. It's fast and flexible.
Demo: Dump via MSSQL BCP and load into MySQL using LOAD DATA.
Having a staging environment is important. Rather than loading all the data into the warehouse, you can do a lot of intermediate work on a different server before loading into the "real" warehouse. You can use this staging area for run validation checks, manage any changes needed (SQL, Perl, custom apps, DTS or other ETL tools), and perform multi-step extractions.
A frehness date helps users understand when the latest data isn't as new as they might think.
All Java, open source: Jasper Reports, jFreeReport/Chart, DataViz.
Open Source: Mondrian (Java), JPivot (Java/JSP), BEE (perl).
More demos at the end.
Well, this sucks. For the first half of this morning session (Building Data Warehouses with MySQL), I couldn't even get on the wireless network. After the break, I was able to at least get an IP address from the DHCP server. But it seems that there's no connectivity to the outside. I can ping the gateway just fine.
Grr.
O'Reilly's wireless network is really starting to bug me. And from the folks I've talked with, I'm in the majority here.
More delayed posting today...
I'm sitting in the Jabber Bootcamp session now with Derek and Ray. Not a lot new here if you've seen a good "How Jabber works" presentation of some kind. Given that it's a half-day tutorial, the presenters are able to take their time and go into a good level of detail. It seems to be a very good Jabber overview.
Jabber is quite cool. It's really a shame that it's not more widely used.
Reminder to OSCON attendees: You can ping a TrackBack URL for each session. The ping URLs are available from the Conference Grid.
Update: Derek found this URL: http://alpha.oreillynet.com/cgi-bin/tb/tb.cgi/oscon2003 for general conference pings.
Warning: I'll be posting a bit out of order...
I attended Damian Conway's ~damian/bin tutorial this morning (Monday). It was entertaining (as always). The focus of his talk was convincing us to customize our environments for personal productivity. I felt like much of what he covered was already standard practice for experienced (2+ years) Unix folks, but maybe not.
He covered shell aliases, vi customization, filename completion, small utilities (some in Perl). He demonstrated many of his personal tools and customizations and even makes them available under the Artistic License. Get them here:http://www.yetanother.org/damian/bintools.tar.gz
I didn't really learn much new information, but I did get a new appreciation for how much repetitive work can be streamlined with a bit of effort. It's been a while since I added much to my shell aliases and tools.
Well, I'm sitting downstairs (in a conference room) during a break at OSCON and can't get much of a signal from the O'Reilly network. The morning newsletter is advertising such a network, but it's more of a notwork right now.
kismet finds the network, but there appears to be no traffic on it. Grr.
I should have headed up to the speaker's lounge during the break. They have hubs and actual ethernet cables there.
Ah, they've fixed it. Now I can complain about a problem that no longer exists! :-)
After a 2 day road trip, I'm in Portland for OSCON. The window is open and there's a Blues Fest across the street. Good stuff.
I have lots to catch up on, including a ton of pictures I took at Crater Lake that I need to post. More later.