Yup, another job opening. If you're interested or know someone who'd kick ass in this job, let me know.
The job is on-site in Sunnyvale, California.
<job_posting>
Enjoy solving hard problems creatively? Know all the GOF patterns? Can you make database schemas into 3rd Normal form? Do you know the difference between REST, SOAP, and MOM?
We are looking for an engineer to architect new services, build shared libraries, and refactor existing systems. You will work with Yahoo! News, Sports, Weather, Finance, Health and other groups to build exciting systems. You will deliver complex projects in demanding deadlines while helping other engineers design and implement their systems.
If you've had experience building high-throughput systems, can design an class hierarchy in your sleep, and know all about web services, then we're looking for you.
Qualifications
</job_posting>
I know some (many?) of the folks you'd be working with in this job. They're smart folks who love building great technology.
Oh, and don't ask me what "exciting systems" are. We all know that job listings are partly sales pitches, so that's what you get I guess.
Beware of men preaching of false hope.
Take, for example, the way some folks feel like they need a database abstraction layer in their applications. Rasmus has long argued against them, and I've agreed with his reasoning and conclusion. (Because it's correct!)
I was reminded of this when I recently read "rant, by request...", in which the author argues against the Smarty PHP template system. Why? Because PHP is itself a templating system. Adding another layer increases complexity, degrades performance, and generally doesn't really improve things.
So why do folks do it? Because PHP is also a programming language and they feel the need to "dumb it down" or insulate themselves (or others) from the "complexity" of PHP.
In that same article, I find myself strongly disagreeing with something else the author says:
Pick any book on PHP from a shelf in your local bookstore, and look how result rows from a MySQL database are printed. (MySQL is of course the DBMS used in those books, which should already give you a clue about how bad the book is.) The mysql_-functions are used all over the place in the presentation layer.
So, only bad books discuss MySQL in their examples? Let's look past that obvious bashing, and continue...
Here, in these forums, we have learned people to not use those mysql_-functions directly, but use a database abstraction layer instead. This makes coding simpler (no need to know all those functions for the various DBMS's) and when they decide to use another DBMS instead of MySQL (and they undoubtedly will at some point), the conversion will be painless.
The fact that he generally pisses on MySQL isn't what bugs me, though it doesn't help. What bothers me is the double-standard. He's advocating "raw" PHP instead of more "abstract" templating languages because they're bigger, slower, more complicated. But when it comes to the database side of things, he's suddenly arguing for the bigger, fatter, slower abstractions again?
This makes no sense for several reasons. Let's look at them.
The author uses an argument I hear all the time: If you use a good abstraction layer, it'll be easy to move from $this_database to $other_database down the road.
That's bullshit. It's never easy.
In any non-trivial database backed application, nobody thinks of switching databases as an easy matter. Thinking that "the conversion will be painless" is a fantasy.
Good engineers try to select the best tools for the job and then do everything they can to take advantage of their tool's unique and most powerful features. In the database world, that means specific hints, indexing, data types, and even table structure decisions. If you truly limit yourself to the subset of features that is common across all major RDBMSes, you're doing yourself and your clients a huge disservice.
That's no different from saying "I'm doing to limit myself to the subset of PHP that's the same in Perl and C, because I might want to switch languages one day and 'painlessly' port my code."
That just doesn't happen.
The cost of switching databases after an application is developed and deployed is quite high. You have possible schema and index changes, syntax changes, optimization and tuning work to re-do, hints to adjust or remove, and so on. Changing mysql_foo() to oracle_foo() is really the least of your problems. You're gonna touch most, if not all, of your SQL--or you'll at least need to verify it.
That doesn't sound "painless" to me.
The author is also clearly unhappy with the alternative, having mysql_foo() and mysql_bar() functions all over the application. Well, I may be nuts here, but I never have that problem. I use a revolutionary new programming technique. Instead of littering my code with those calls, I put my core data access layer into a library--a separate piece of reusable code that I can include in various parts of my application and... reuse!
That means if I ever decide to make major changes to the way my application interacts with the database (persistent connections, replication awareness, load balancing, different error handling), I'm able to do so without searching every damned file in my code base for mysql_* functions that I need to tweak.
I never thought this was rocket science, but apparently it has eluded him. Somehow he manages to see the benefit of separating presentation from logic, but never considered separating the data access layer from the data processing layer.
Some things never cease to amaze me--and make me very sad at the same time.
Congrats to Joyce and the front-end team Friendster for pulling off the JSP to PHP migration. Having the front end less tightly coupled with everything else ought to make life a lot easier for some folks there.
Now, any bets as to when the "beta" label will come off their logo?
Note to anyone else who might do the dumb thing I did. When you're handling GET/POST input in PHP and wan to know if a value is numeric, don't use is_int() because it'll lie to you. What you want is is_numeric(). Of course, the is_int() documentation specifically warns you about this, but it doesn't do much good if you read right past it.
I was *this* close to walking over to Rasmus or Andrei and asking for a sanity check, but I decided to re-read the docs one more time. Heh. There's an hour or so I'll never get back.
Of course, this *is* the first PHP work I've done in quite a while. You don't want to know how many times I tried to declare a variable like: my $foo, thinking it was Perl. I think that's finally out of my system.
Note: Coments may be off-line for a bit. I'm being DoS'd by some comment script kiddies right now. Retards.
A couple weeks ago, I posted asking for web discussion board recommendations. In the comments Rasmus Lerdorf pointed out that most system suck because their security is a joke.
Of course, he's right. I've been on Bugtraq long enough to realize that the popular PHP-based boards and community systems seem to get compromised in some way or another (SQL injection, cross-site scripting, etc.) on a very regular basis. That's part of the reason I asked in the first place. I was hoping someone who knows more about the scene would enlighten me. And, despite that fact that I omitted security from my original list of requirements, it worked nicely.
Then, yesterday, I was looking at the MythTV project, which is an impressive Linux PVR solution (think "Open Tivo"). Literally as I was browsing the site someone compromised it. See the screenshot at the right? I took that just in case it was fixed before I had a chance to right this. Indeed, a couple hours later the site was back to normal.
Witnessing this real-time "hacking" is a sobering example of how far things have to come. If you've been brainwashed by Eric Raymond's "all bugs are shallow" logic, ask yourself why we keep seeing this sort of thing happen with popular Open Source Software such as PHP-Nuke.
Come to think of it, I think I've written about this before. Looking back over it, I still agree with myself.
The slides from my "MySQL, PHP, Stuff" talk on on-line now in HTML and PDF format.
Enjoy. (Well, as much as you can without the audio of me filling in all the detail.
Update: I'll fix the spelling bugs and re-post later today, I hope. Luckily my loyal readers are noting them in the comments.
I finally made my hotel reservation for PHPCon East in New York. Amusingly, I called their 1-800 number a few hours ago to find that their reservations desk was closed and that I'd have to call back on Monday. WTF?! Worse yet, their automated message never suggested their sucky web site.
So I went there anyway and made a reservation. And guess what... I got a better price. They're offering $124/night on-line. The conference discount is supposed to be $169/night.
Now I know why their phone message doesn't drive you to their web site. They can make more money off you on the phone. I'm sure this isn't news to Rudy Maxa but it's the first time I've run into this.
I just made my flight reservations for PHPCon East 2003 on Y! Travel. A round-trip, non-stop flight from San Jose to New York was less than $300. And I even got times that were quite reasonable.
Excellent.
Not to arrange who I'm going to hang out with while vising NYC. Anyone live nearby and want to get together for a bit?
And, yes, it is my first time to NYC.
I finally got off my lazy ass and wrote up abstracts for some of my upcoming talks. Here's the scoop on the MySQL Conference and the PHP Conference (both in April). I'll hit the OSCON ones in a few days (or weeks?). So far they haven't nagged me yet, but I'm sure they will. :-)
At the 2003 MySQL Conference, I'll be giving a 2-hour talk on Thursday (April 10th) at 10am that's listed as "Using MySQL Replication in Large Scales" Here's the abstract I've submitted:
Replication provides a great mechanism for scaling MySQL beyond a single machine and even across vast distances. It can also be used to provide a "hot spare" server which can be used in the event that the primary server fails.
In this session, we'll look at how MySQL replication works and how to configure it. How is replication in MySQL 4.0 different than in the previous releases? nnn
We'll also cover common problems and solutions. Why does replication fail? How can you monitor and detect when replication fails? What's the best way to add one more new slaves to an existing replication setup? Which replication topology makes the most sense for a given application?
Finally, we'll discuss hardware and software solutions that can be combined with replication to provide load-balancing and high-availability.
Then on Saturday (April 12th) at 4pm, I'll be giving a 2-hour talk titled "Optimizing MySQL" Here's the abstract for that one.
As the load on a MySQL server increases, its performance may degrade if it has not been properly tuned to handle the load. A default installation of MySQL performs well for many applications, but it generally will not perform efficiently under stress.
In this presentation we'll discuss many of the tunable parameters in MySQL's configuration file (my.cnf), how to read MySQL's performance counters, and various optimizations which can be used to improve the performance and efficiency of MySQL servers--often with dramatic results. We'll also examine MySQL's various table types as well as hardware solutions to performance problems.
This conference is really gonna be cool. Check out the schedule to see for yourself. The only thing that bothers me is that there are other talks that I want to attend while I'm presenting. And even when I'm not, there are some tough choices. During many of the time slots, I want to attend at least two of them.
Good for the conference. Bad for me.
At PHPCon East 2003, I'll be giving a 75 minute talk titled "PHP & MySQL Performance Tuning" Here's that abstract.
They're they dynamic duo of LAMP. Fast, easy to use, wildly popular, and extensible. But what happens when your MySQL-backed PHP application starts to slow down? Where do you look? What tools will help identify bottlenecks? What techniques can help to avoid performance problems with PHP & MySQL?
In this session, we'll take a whirlwind tour of MySQL performance viewed thru the lens of PHP (and Apache). In doing so, we'll discuss and illustrate answers to all of those questions.
I'm looking forward to the conference and visiting NYC for the first time. That's right, I've never been there. I'll probably stay a few extra days to hang out with Derek and Kasia.
That reminds me. Flight reservations. Hotel reservations. Ugh. More stuff to do.
Oh, and I should probably alert the boss to the fact that I'll be out a bit in April. Better sooner than later.
Browsing Java documentation on Sun's Java site is incredibly frustrating because the site doesn't seem to support a basic operation I want: search.
Consider this example. I performed a Google search for java.util.Map.Entry and eneded up on this page. It has a lot of useful information on it. But now that I'm there, I was to search for something. I guess I have to go back to Google and start over. There's no search box on the page. WTF?!
Contrast this with any reference page on php.net, like this one (chosen at random). It's far more search friendly. There's a box right at the top of the page and a drop down box that helps me crontrol what I'm searching.
php.net docs: good.
Sun Java Docs: less good.
Oh, it's not just PHP. The MySQL folks get it too. Here's a randomly selected example from their docs. Notice the search box.
After the conference was officially over, I had some time to hang out with Zak and Jim (from MySQL AB), Shane from ActiveState, Scott, and a few other folks. We munched on the hotel bar's snacks, had a few drinks, and chatted about lots of geeky stuff and some not-so-geeky stuff.
But if there was any doubt as to our true nature, the truth was revealed when Zak busted out a notebook to get down-and-dirty with the source code to figure out the right way of fixing PHP's mysql_pconnect() so that it'd be less wasteful of connections.
Heh. Oh, well. Another conference over. Met lots of good people, some new and some old.
Perhaps some of them will even be at PHPCon East 2003. :-)
The conference was closed by Dirk, one of the founders of Rackspace discussing the critical role that PHP played in getting Rackspace off the ground. He focused on PHP's integration, quick development times, and flexibility.
He then threw us a bit of a curve ball by revealing that a sizable chunk of their PHP code is being replaced by Python. The silver lining in the story is that SOAP is allowing them to keep much of the customer-facing stuff in PHP and the back-end code in Python.
In chatting after his talk, I was impressed that Dirk remembered me from last year's Open Source Database Summit. He did a keynote talk there too. Apparently he remembered my first talk about Yahoo and MySQL.
On Friday afternoon, I had to spend some time with the conference staff to go over various things. In the time I had left, I bounced between Scott's PHP Security talk and George's High Performance PHP talk. Scott's was a little basic for the audience, as he notes on his weblog. George's seemed dead-on. I learned how to do things in PHP that I knew how to do with Perl--benchmarking and profiling. It's good to know that PHP has those bases covered.
I didn't get a chance to visit Shane's SOAP talk. I would have liked to sit in for a few minutes, but I got caught up in George's presentation.
On Friday morning, I sat in on Stephan Schmidt's "Introduction to XSLT with PHP" presentation. What I found interesting here was not how XSLT works (I already knew that) but the two things I learned. First, I finally got a handle on XPath syntax. I'd heard that it is powerful--a sort of "regular expressions for XML" but never spent more than 10 seconds looking at it. Now I have a much better appreciation for it.
The second thing I got out of it was an idea of how many XSLT related PHP modules are floating around out there. I expected there was just one or two. The good news is that XSLT with PHP seems to be decent now and rapidly improving.
On Friday morning, I attended Michael Radwin's Making the Case for PHP at Yahoo! talk, even though I'd seen it the day before at work. The room was packed. A lot of people were interested in what we're doing with PHP at Yahoo. And Michael's talk did a great job of explaining things.
He started with an overview of Yahoo's server-side "scripting" technology, from the early days all the way thru to today. He also spent some time discussing what makes Yahoo special and how that factors into our requirements for a scripting language.
He then discussed the selection process we went through and the benchmarking we performed. Finally, he discussed what we've learned in the 3-4 months since PHP was first deployed at Yahoo.
After all the talks were nearly done, I hooked up with George, Scott, Bryan & Tiffany (of Pyzine fame), and a few others. We headed downstairs for drinks and food in the hotel bar.
We chatted about tons of stuff. Google, Yahoo, weblogs, Dave Winer, O'Reilly's lack of PHP books, the World Series, and so on. The food wasn't terribly good (don't order the peach cobbler) but he beer was.
Other groups ventured up to San Francisco for food, dancing, and other festivities.
On Thursday evening, the last Work-in-Progress talk I attended was Philippe Lewicki's "Enterprise Application Migration to PHP/MySQL" in which he described his company's approach to migrating a typical business application to the web using PHP and MySQL. The current system runs a on a Mac server and clients on Windows. The clients can generate simple reports and graphs, as well as running standard queries and entering new data. By using Mozilla, MySQL, PHP, and some interesting PHP modules and add-ons, they've been able to provide a pretty compelling web and open source-based alternative.
Some of the things he demonstrated were really impressive. I'm starting to wish I had taken more notes. Or any notes.
On Thursday evening, I attended George Schlossnage's Work-in-Progress talk on Apache_Hooks, a project to allow PHP access the various request phases of Apache. George is actually an accomplished mod_perl and PHP hacker. So he got involved with this project (originally conceived by Rasmus) to try and level the playing field a bit between PHP and mod_perl.
To provide a bit of context, imagine being able to write an Apache authentication handler in PHP that could then let control pass on to a mod_perl content handler.
Apache_Hooks is still very experimental but it seems to work reasonably well. It's in a separate branch of the PHP CVS repository for now. Nobody know if or when it'll become mainstream, but it is very cool stuff. It sounds like a few folks were interested enough to try running it in production.
I have a copy of his presentation, but I'm sure he'll have one on-line soon. Check the PHPCon web site in a week or two. We're trying to gather all the presentation links there.
This WiP also demonstrated the power of wireless networking in an amusing way. George was having trouble with the projector because his newer TiBook doesn't have a standard VGA out plug and he forgot the adaptor for it. Nobody else had one either. We puzzled over what to do until someone realize that 5 or 6 of the 10 of us in the room all had laptops with 802.11b cards and VGA out. So we setup an ad-hoc wireless network and FTP'd the slides from George's machine.
Update: As George notes, his Apache Hooks talk is now on-line.
On Thursday afternoon, I attended Scott Johnson's Software Engineering Practices for Large-Scale PHP Projects talk. His talk was very popular (had to move to the ballroom) and very good. Scott did an excellent job of reminding us all that just because PHP is easy to code, we shouldn't take shortcuts and forget everything that software engineering stands for.
Much of the advice was very practical and often backed up with real-world examples to illustrate some of his points. Give the talk a look.
On Thursday afternoon, I gave my Scaling MySQL and PHP talk. Amusingly, I put the talk together only 1.5 days before the conference after I found out that a speaker had canceled and they needed to fill a spot.
The alternate title for the talk is "The making, breaking, and repair of remember.yahoo.com." It covers the project to build remember.yahoo.com in one week's time, the site launch, most of the problems we faced while it was on-line.
The talk was well received and I enjoyed presenting it. It's always fun to say, "look, we did some stupid things--try and learn from our mistakes."
On Thursday morning, I attended Christian Wenz's talk about Microsoft's ASP.NET, Web Services, and PHP. He began with a short introduction to Web Services and XML. Then he demonstrated building and using Web Services using Microsoft's ASP.NET and C#. Then he discussed how this relates to PHP. Is PHP dead? Can it compete?
He then discussed a few ways that you can both produce and use Web Services using various PHP modules. It's not as easy as Microsoft makes it, but it's certainly not impossible. And it's only going to get easier as time goes on.
I actually learned more from this talk about ASP.NET than I did about PHP. I feel like I understand both technologies better as a result.
On Thursday morning, Rasmus Lerdorf (now working at Yahoo) gave the opening keynote. (Expect it to appear here someday.) He covered physics, rocket science, the web problem, and a little bit of PHP along the way. One of his main points was that the web problem isn't fundamentally difficult. Unlike complex web software from various commercial vendors, PHP provides the basic tools to need to build solutions to "the web problem" without feeling like you need a degree in rocket science to get started.
There was a bit of discussion about changes to the language in PHP 4.3 and/or 5.0. The one point that came up repeatedly is that PHP will create references to object by default, rather than copying them. That may break some existing code, but it'll do What Everyone Already Expects so it's a Good Thing.
Now that I'm mostly recovered from PHPCon 2002 (still a lot of work e-mail to plow through), I'll try and recount what I remember of the keynotes, sessions, and so on.
In general, I enjoyed the conferece a lot. Met some interesting people and learned some new tricks--as always.
A presenter just cancelled, so it looks like I will be talking at PHPCon. Now all I have to do is put together a 90 minute talk. Soon. Really soon. Because the talk is Thursday afternoon.
I had a feeling that I should have prepared something "just in case" but decided to ignore it. Murphy, however, was taking careful notes that day.
My talk should be in place of Dan Cowgill's on the schedule.
Are you planning to blog PHPCon later this week? If so, let me know. I'm trying to get some sort of TrackBack site or something setup to aggregate the discussion. I'll likely link it on the PHPCon web site too.
Oh, there won't be wireless like all those fancy O'Reilly conferences have, so it'll have to be an evening activity. Sorry. It's just too expensive for a first-time conference. Maybe we'll change that at next year's PHPCon...
Oh, it might be fun to play "count the Yahoo employees" at the conference too. There will be many of us there. Heh.
Though we've had a subscriptions page up for a while on the PHP Journal web site, there were two problems with it.
Those are now fixed. If you subscribed before, please head over to the PHP Journal site and re-subscribe. If you haven't yet... Well, what are you waiting for?
The cat's out of the bag now. We're launching the PHP Journal. Visit the new website for some basic info (we're adding more as we catch our breath).
A few days ago, I decided to finally do something about my Apache logging mess. The "mess" is that I host about 15 virtual domains on a couple of colocated Linux servers. Most of the domains either belong to me or my friends. A few are business related. For the longest time, I've had apache configured to log in the typical combined log format with one log file per domain. I haven't rotated (or cleaned up) the logs for a long, long time.
Enter mod_log_mysql, an apache module that allows you to log directly to a MySQL server, optionally logging to disk as well. The module is simple and straightforward to setup. Most importantly, it Just Works. There's even a cool MySQLMassVirtualHosting option so that it'll log each domain to a separate table and even auto-create new ones as you add domains to your apache configuration. Very cool.
I did manage to find a bug along the way. The module didn't properly quote table names. So if you have a table named something like advanced-mysql_com (all dots become underscores), MySQL will barf on the INSERT and CREATE TABLE statements. But since I had the source code, it was quite easy to fix. I'll be sending a patch to the maintainer soon.
Why this is so cool.
Once everything was up and running smoothly, I had a number of MySQL tables collecting data about traffic to my various web sites. "So what?", right? The cool thing is that by adding a couple of indexes to make queries fast (there are no indexes by default), I can whip up a PHP-based application that presents interesting stats to me in real-time. The PHP app isn't done yet, but it's quite functional for providing a high-level picture of what's going on. I can see which of my blog entries are the most popular, who is visiting them, and where they came from (referer tracking).
More to do.
I'm not going to make the URL public yet, because the app needs some work and a good security audit. But now that the data is in a more accessible format, I can do tons and tons of stuff with it. I plan to have a module on my blog index page that lists the most popular entries in almost-real-time. I could have it updated every minute or two without putting much of a drain on the system at all.
The other thing I need to get around to is importing a few years worth of old log data from the existing access_log files. That's just going to take a bit of time to write a Perl script to do the job. Once that's done, I'll be able to answer a lot of interesting questions about my web sites.