Thursday, July 26, 2007

The Principle of Infinite Methods -- 9 Kinds of Words

The 9 kinds of words are nouns, pronouns, adjectives, verbs, adverbs, propositions, conjunctions, interjections and articles. Was talking to my sons about this this fact a few days back ... they didn't know it. Neither had I when I was their age.

I dropped out of high school after the 10th grade, mostly because I was bored to tears with it. The summer I was 16 I ended up homeless and sleeping in a park, which put a crimp in further education --

The other day my 8 year old, Richard, wanted to know if I’d played basketball in college -- he has a serious basketball jones and we’re sending him to a basketball camp next summer. He and I play together regularly and we have not much in common otherwise -- outside bad action movies and being guys, you know -- Richard’s older brother Bram lives and eats Pokemon, and Richard’s almost as bad. My Pokemon knowledge extends to “Ash,” “Pikachu,” and “training,” because you train Pokemon. Aside from this their lengthy discourses on the subject are in Mandarin.

So we talk basketball or movies when we socialize -- which is more than my Dad and I had in common, so bonus there. (To be fair to me and my Dad -- he didn’t watch movies much, and football bores me to death; but I watched University of Miami football games just so that I could suffer and gloat with him. And at the 2 minute mark of most Lakers games, I’d know he was tuning in so he could call up and exclaim over Magic or Kobe’s brilliance, despite being more indifferent to the Lakers than even I was to Miami. “I love you” can be said in lots of ways.)

But the “did you play basketball in college” -- I sidestepped. “No, honey, I never played organized basketball.” My older kids know I didn’t go to college (my daughters even know why) -- but my kids go to good schools, are doing extremely well in school, and once the habit of good school performance is set, we can have different sorts of conversations about why I didn’t do well in school --

One of the reasons is that I think analytically and was almost uniformly bored in school because the material wasn’t presented in any unifying structure. This analytic tendency has been hugely useful to me as a programmer -- decades of hammering away at my craft have separated out what’s critical to the process of building scalable, maintainable websites, from what’s not.

I used to be a big Hungarian notation guy -- the field has pretty thoroughly moved away from that, so I modified to a very minimalist Hungarian notation (sWord for strings, dWord for dates, nWord for numeric data including money.) Even that I finally abandoned -- there didn’t used to be editing environments that permitted you to inspect a variable’s properties, not for SQL Server, which is 90% of my development time these days. (The other 10% is also usually SQL, mySQL, a little Oracle -- very occasionally some VB.) But for a few years now there have been environments where, if I hovered the mouse over a variable, I could discover its type if I didn’t remember it -- more mature environments have done this forever, of course, but I still almost exclusively write T-SQL in Microsoft’s query analyzer -- which doesn’t. However, the new SQL Server 2008 does do this ...

I’d already abandoned my last vestige of Hungarian notation a while back. Over a year ago now I joined a startup that had code written in 4-5 different naming/formatting standards, all conflicting. I settled on a naming convention I didn’t like, mostly because it was the convention most frequently in use at that company, and as we’ve refactored we’ve cleaned up, until 2/3rds (up from maybe 1/5th) of the codebase now uses this naming and formatting convention.

The short version of this is, for a table: tbl_noun_relationship_to_other_nouns. So, for example, a table that stored addresses would be tbl_address; a table that stored companies would be tbl_company; and a table that stored the many-to-many relationship between companies and addresses would be tbl_address_company_map.

Code is usp_noun_verb ... usp for “user stored procedure,” to distinguish it from Microsoft’s stored procs, which are sp_whatever. A procedure that retrieves company data would be usp_company_select.

Simple enough, though I imagine I lost more than half my readers already. But what I said above, about what’s essential and what’s not? Naming conventions are necessary: but it’s mostly irrelevant what they are as long as they’re not downright stupid. I’ve known this for years but still felt that my way was the right way.

So for about a year now I’ve coded in this new naming convention -- not one I’ve used before. Got comfortable with it. So ... recently I went back to do some consulting work for an old client. And there’s a lot of my old code floating around in production over there.

As I say, I adopted the sans-Hungarian notation lower case with underscore naming convention reluctantly, because it was as close to a common convention as existed at my new company -- and when I started working again with code I wrote between 3 and 8 years ago, I was downright annoyed at how unintuitive my old naming convention was. Obviously the correct way to do it is non-Hungarian lower case with underscores ...

I’m never having a naming convention argument again as long as I live. The part of my brain that cares about such things is stupid and fickle.

~~~~~

So what is essential? I write well and was reading at 4 -- and was a terrible frustration to my teachers. I don’t think I ever got an ‘A’ in English in my whole life. (Even the numerically challenged could count the ‘A’s I did get without taking off their shoes.) I had a teacher in junior high who had other kids’ parents angry at her because she kept giving the class harder and harder spelling tests -- I wouldn’t study for them and I never got a word wrong. Drove her batty.

But I was in my teens before I got to where I could diagram a sentence -- some of my first stories came back from George Scithers at Asimov, bless him, and he suggested a book -- I forget the title now, but I sat down and read it. And discovered there were only 9 kinds of words. That’s it! That’s grammar! (OK ... it’s not grammar. But it’s the hard core of it.) If any teacher had ever told me there were only 9 kinds of words in the English language, I think I’d have learned them.

I got a few As in my life, but I only specifically recall one -- summer school, a 10 week fast-moving Geometry class in between the 9th and 10th grades. The math teacher didn’t want me -- I’d done badly in his Algebra class the previous year. But 10 weeks to cover the entire book was exactly the right speed -- it went fast enough to keep my attention, was exactly the sort of material that I’m wired for, and across all the years I went to school, is my one really outstanding memory for hitting a subject matter I liked, being engaged with the material, and having the class move fast enough. That teacher then took me into his trigonometry class in the 10th grade, with high expectations. Bad year -- we had the PSATs that year and I got the 2nd highest score at that school, a private Catholic boy’s school with a lot of really smart kids -- I’d skated through the 9th grade without any teachers noticing me. That damned test brought me to their attention and I was thoroughly miserable the whole tenth grade.

But the person most disappointed in me was my math teacher, because he knew what I was capable of first hand -- so about halfway through the year he let me study at my own pace, and the second half of that class was better than the first. I was well into a different textbook by the time we got done, though I still didn’t bring the overall grade up to an A -- missed too many tests if I recall.

Aside from a couple computer courses, an astronomy course, and 2-3 writing courses at a community college, I’ve never been back to school. But I’ve kept learning -- I’ve read well over a thousand non-fiction books, learned a variety of useful business and life skills; at my own pace and when I felt like it. And what’s come to me through the School of Dan, which I never quite got straight in real school, is that in all material there are core concepts, peripheral concepts, and chrome. Most of the schools I went to as a kid taught chrome, looking back at it.

What does core look like? In both writing and programming I’ve come to believe that it boils down to conciseness. I recall, very early in life, reading a book called “Philosophy and Cybernetics.” This exposed me, though I didn’t realize it at the time, to this idea: entia non sunt multiplicanda praeter necessitatem.

~~~~~

In the business world I live in good database design does not consist of doing more with less: it consists of doing less. Storing less data. Creating less structure. Writing less code.

This is not the way business people think about databases (to the degree they do think about databases.) They tend to believe that large is better than small, that more tables are better than fewer tables, that more data is better than less data. The problem with this is that data may or may not be of value. The following strings contain equal amounts of data:

‘00000000000’

‘I love you.’

Each string contains eleven characters worth of data, but the second string contains more actual information. So we come to a simple enough precept: data is meaningless but information is valuable. The more concisely information can be stored and transmitted, the more effective and useful it is.

Both writing and programming I approach from the same perspective: do less. Omit words, to quote a smart guy. Minimize structure. Minimize code. (“Minimize structure. Minimize code.” was a sign I used to have hanging over my desk at various companies.)

I’ve been interviewing DBAs for twenty years. And there’s a question I ask all DBA candidates, which in twenty years only a few people have ever answered correctly. It’s this:

What, in almost all cases, is the difference between a query that performs badly, and one that performs well?

I’ve interviewed some very bright people over the years, and the variety of answers I’ve gotten to this question has been interesting. Good indexes, I’ve been told: covering indexes, clustered indexes, high cardinality indexes. Good statistics, I’ve been told. A proper execution plan. Proper use of temp tables, or derived tables, or table variables. Proper joins. Correct normalization. Wise denormalization.

None of these answers are wrong, necessarily, but they miss the point. Database queries run on a computer. A thing that exists in the real world. And, with very rare exceptions they run against data which is stored on some form of magnetic media. And magnetic media is slow. Off a good RAID array at the time of this writing, you might be able to pull 300 megabytes per second in sustained bursts – bulk transfers of large files. Database queries, inherently more dependent upon random access, will be slower. 100 megabytes per second throughput, with real-world equipment, is a superb result.

To put this in context, modern high-speed RAM has throughput to the CPU of over 10 gigabytes per second – about two orders of magnitude faster.

The difference between a query that performs badly, and a query that performs well, in almost all cases: the query that performs well executes with fewer reads. So the core concept in this particular case, the part that’s not peripheral or chrome, is that databases perform well in direct proportion to the degree that they retrieve the correct answer with the fewest reads.

This question will be on the test.

Now ... how you get to that goal is peripheral. There’s more than one right way to perform most tasks ... but there are an infinite number of ways to perform a task incorrectly. (Moran’s Principle of Infinite Methods -- “Infinite Methods” is the title of one of the many, many books I’ll probably never write.) So the first pass in learning any skill is to get out of the Infinite Methods. Once you’ve done that, you’re an amateur: you can do work that functions, more or less, though it may not be quick or elegant or scalable or easy to maintain, or whatever -- but it produces a result that matches your stated goal. That’s an amateur.

At some point on the path of acquiring a particular skill you’re a professional. Most likely you know a few different ways to solve any problem outside the Infinite Methods. And now your job starts to get more complex again -- you have a toolkit and it’s bigger than it used to be. If you’re honest with yourself you really don’t know all the time which approach is best for a given problem, because you haven’t solved Problem X enough times to have a clear sense of all the different ways to do it. (Some people never do solve Problem X in more than one way -- makes the job easier, but they never do get past the status of journeyman.) So you flex -- curiosity is a good trait here. Try X, try Y, try Z. You have business needs that need to be met, that’s life in a capitalist society -- so stay late and try the alternate approach. Noodle away at it over the weekend. Think about it before bedtime. What’s the core of my problem? What’s the simplest way to solve it? What approach takes the fewest steps, requires me to build and maintain the fewest objects?

This, just for the record, is where programming and writing mostly part ways -- you don’t have to maintain a production environment in writing. Once a piece is done it either works or doesn’t, and with very rare exceptions you’re not going to tune it up again later. In a way this is unfortunate: re-writing an old piece many years later is a huge learning experience, in both text and code.

If Stephen King and JK Rowling had to come back years later and rewrite their novels, they'd learn to write shorter the first time around.

~~~~~

Minimize structure. Minimize code. It’s a reminder to me to never build something I don’t need, and to never build something that’s similar to something I’ve already built. When in doubt, extend and reuse the similar entity. When in doubt ... don’t.

Occam’s Razor doesn’t, despite popular misconception, say “Pick the simpler solution, all else being equal.” What it really says is: entities should not be multiplied needlessly. Which, if you study the idea, takes you to reductionism, to parsimony -- I’ve written statistical software; if I hadn’t been exposed to the idea of parsimony ahead of time, I’d have written useless statistical software. Statistical software (in particular for business-oriented process automation, which I’ve essentially worked in my whole adult life) works best to the degree you can identify the core discrete data points required to make a prediction, and thereafter quitting before you get yourself into trouble. (I’m told that it’s different in actual research; I wouldn’t know but it sounds reasonable.)

What’s parsimony? Less is more. Minimize structure, minimize code. And save some thoughts for later.

~~~~~

Sometimes these posts end up longer than I intend.

I’m working on a very short database book -- “The Elements of Speed.” In concept it’s a direct lift of Elements of Style, though obviously on a rather different subject matter. In very short (non-Microsoft-specific) form it covers my thoughts on how to build simple structures that perform well and are easy to maintain. (Did you know there’s only two things in the universe? Matter/Energy and time. Things and time happening to them. More on that in the book.)

I’ve been enjoying working on “Speed” -- I get to write and code at the same time. How can I say “x” most succinctly? With the fewest words? Constant revision is the key; you can boil down most ideas if you have time -- this post should probably be half the length it is. But I’m delivering an app later today -- it’s short and elegant, but you know -- I’m getting paid for that.

11 comments:

Thomas said...

I really enjoyed this post. I have a BA in English, and 3 minors (Physics, Math, Comp Sci). I started out in the sciences in college, realized that there was a certain point where I was lost, but that I loved imagining what might be around the next corner. At one point in my junior year, my writing professor introduced me to his mentor at a smoke-filled party in Iowa City. Meeting Kurt Vonneghut was probably the coolest thing I did for that decade.

It went nowhere, however, and I've been writing code ever since. Talking about how much we're alike makes me sound like a stalker, so I'll stop there.

I do find it funny, however, that your mantra in this post is to Minimize and that you conclude it with "Sometimes these posts end up longer than I intend." Perhaps that was intended. It certainly pulled a chuckle out of me.

SF said...

It's always seemed to me that high school Euclidean geometry is very close to programming in spirit.

Dan Moran said...

Thomas -- yeah, that was intentional. It was easier to make a joke about it than to fix the post.

Don't worry about the stalking thing. Alan Rodgers has spent years posting about how much he hates me and is going to get me good. You don't compare. :-)

Sean Fagan said...

I skimmed through this earlier, but just read it more deeply now.

I don't know if I've mentioned it, but what I do these days is filesystems, for, shall we say, a fruit company ;). Deeply ingrained in everything we do is: hitting the disk is expensive -- reads are slow, writes are slower. ("Disk" is generic here, btw; flash is also slow, but in different ways, and has its own characteristics.)

One of the facts we have to live with, which affects your world, is that we also have to make sure the data get there, and get there in the right order. And, conceptually, the answer to that really is: write as little data as possible, because it'll save you later.

Realistically, however, disk drives fail, and they fail silently. (DId you know that? About one silent error for every 8TBytes of I/O.) So that means we have to write more, to make up for the possible errors that may happen behind and beneath us. And this then means that getting that data later is slow, because there's more to read.

Anyway.

Write your book. Get into the details behind why the things that are irreparably slow can't be faster, and I'll find some uses for copies. :)

Dan Moran said...

Sean -- I'll forward pieces of the book to you as I complete them. Be interested in your professional take.

I sure do know hard drives fail. Are you saying that writes fail w/out the hardware catching it? I know that used to happen (wrote a hashing algorithm well over a decade ago to catch cases in a Foxpro app that kept having weird read errors -- traced it to a particular disk vendor who swore it wasn't their disks, but it only happened in environments with their drives; and in the end we proved it was in fact their SCSI implementation) ... but I hadn't run into anything similar in years. Is that all I/O, or is it more specific to reads or writes? And is that metric specific to the brands of hardware the fruit company uses, or for all magnetic media?

Sean Fagan said...

Yes... errors without anything being notified. The number comes from the team at Sun that wrote ZFS -- it does ECC checksumming of everything, and they ran a lot of tests to test rates. People I know ran their own tests, and found similar results.

That means this is across multiple levels of drive technology (enterprise- and consumer-level), on multiple hardware and software platforms.

The errors are for both reads and writes. Essentially, if you do 8TBytes of disk I/O, you'll have gotten one undisclosed error. Now, since most people do more reads than writes, that means a single bit error, usually, and it tends to get lost in the noise. But it's there. And things like RAID don't necessarily help here, because the error can occur anywhere along the path.

The reported error rate is higher, by the way.

Anonymous said...

Hi Dan,

Sorry to go off topic, but I had the following question for you in the previous blog entry that you may not have noticed:

I have the same PDA and enjoy reading on it using portrait mode (480 x 640) in order to take full advantage of ClearType.

Does TextMaker Pro do automatic bookmarking for you? If not, why not just use Pocket Word for your RTF documents?

Dan Moran said...

Having installed Textmaker Pro for other reasons (you can write with it -- I have a Stowaway keyboard for the handheld, which gives me the lightest functional notebook I've ever had) ... now when I double-click on a document, it opens up in Textmaker Pro.

TP doesn't automatically bookmark documents, but I manually stick in the string ..mark and save the file, and come back to it later; works fine. Pocket Word is so frustratingly (and obviously, intentionally) limited in every other way, I haven't bothered trying to use it to read docs.

I both respect Microsoft and am dreadfully frustrated by it. They write first-rate software when they feel like it -- their development environments and business software in general are really superb. But the free software movement's chances of long-term success are improved by an order of magnitude by Microsoft's breathtaking arrogance. Except in places where the market has forced them to behave in the customer's best interest, they reflexively choose not to ...

I spend 95% of my day workin with Microsoft software, and I've made a very good living doing that, over the years. I'm sorry I turned down the job offer they made me in 1987 -- it had some very small amount of stock options attached to it that would be worth several millions today. But I'd have had to move to Redmond and my then-wife was dead set against it. So

I'm not a religious warrior, and Google's mantra aside, I don't think Microsoft is evil. But they sure do resemble IBM in a different era, for both good and bad.

Anonymous said...

Dan, thanks. I've done the same exact thing (insert a string to come back to later on). After a while, it seems like a bit of a hassle because I read many books and like to switch back and forth depending on mood. I reluctantly went back to Microsoft Reader -- and put up with the "margins" around the text, which to me is just an execrable waste of valuable screen real estate.

I also have the same Bluetooth stowaway keyboard. In fact I've gone through several through wear and tear. To me it's the smallest, lightest, yet usable writing setup ever.

I've resisted TextMaker so far because I'm even more of a minimalist than you are, and use only the base features of a word processor. :)

Anonymous said...

What’s parsimony? Less is more. Minimize structure, minimize code. And save some thoughts for later.
~~~~~
Sometimes these posts end up longer than I intend.

It is more than 2,800 words long.

Dan Moran said...

I'm tempted to rewrite it as an example of the benefits of revision. My blog posts are likely to run long because I lack time to revise them -- it's easier to write long than short.