Sunday, December 23, 2007

Victim of success

Much has happened in the last month with genetify and my life in general. It has lead me to a new job in San Francisco where I will continue my work on optimization. Unfortunately it has also aggravated my chronic wrist and hand injuries so it costs me too much to continue writing in this blog. The last thing I will say is that it will be interesting to look back on these entries in the months and years to come. The story is far from over.

Deja vu

More sore and more tired, I announce the addition of more dynamic graphs to genetify, this time using Google Charts API.

http://genetify.com/walkthru.php?p=goals

Friday, November 23, 2007

graphs

Sore and tired, I write this post to mark the addition of dynamic graphs to genetify. It was a lot of work, on the one hand, because graphic device drivers are uncooperative, and on the other hand, because my graphing library of choice, the R statistical programming environment, was new to me and it has been used rarely behind a web server. Graphing in R, as good as it is, was never meant to serve a webpage. If it all sounds like it is another lesson in choosing standard technologies to solve a problem, it's because it is. The payoff, I hope in this case, is the analytical power of the R environment (second to none) and the promise of better web server integration thru FCGI or an apache module, of which there is one I discovered.

Wednesday, November 14, 2007

Friday, November 9, 2007

More data

Other progress that I neglected to write about in the past week:

* Foreign key constraints
* "browscap" recording
* Geo location recording
* Referrer recording
* Switch to InnoDB

All this has to do with getting maximum possible data from a visitor when they record a goal. In the future, genetify may be able to do something intelligent with it!

Back to science

I'm eagerly working my way thru this book . It is opening my eyes to a 25 year-old field of study that is directly applicable to genetify. In fact, genetify may be misnamed! It may be more of a reinforcement learning algorithm than a genetic algorithm.

Thursday, November 1, 2007

Documentation is the final test of a design

Just a small observance: Writing documentation forces you to create a good API to your program. Several times today I was trying to describe in writing how to use genetify and I realized that it was unnecessarily complicated or unclear. It's hard to write good documentation because programs often don't operate the way someone would naturally expect. And that's bad.

Tuesday, October 30, 2007

Time for a demonstration

As I've talked to more and more people about genetify, I have felt the growing need for a demo. I wanted something that I could direct people to offhand and also something that I could talk somebody through. I wanted it to hit all the high points and lead people to ask the right questions.

Today, after a day's work (mostly due to bugs -- nothing like a public demo to bring them out) I finished something I'm pretty proud of. And I need to be -- this demo I expect will be the front door into the world of genetify for some time to come.

So here it is: http://genetify.com/walkthru.php

Oh ya, I set up genetify.com at a new webhost yesterday but that's too boring to talk about. The more I learn to program, the more I hate sys admin.

Monday, October 29, 2007

Performance == cacheing

Without really needing to, I wrote in genetify.php some code to cache results locally. Once again, inspiration overwhelmed me. My new ideas today were to:

* use the default PHP temp dir in order to avoid extra setup (duh!)

* request a new result set with a random, specifiable frequency (like 0.10) -- simple but effective in reducing requests

* update the cache on disk as a shutdown function -- this means a request appears to complete just as fast as without cacheing!

I can't wait to see how much traffic the whole system can bear now. Because the information being processed by genetify is not necessary for coherent interaction with a site, it can be cached endlessly. This simple inverse relationship between the value of up-to-date information and cacheing means that genetify should be able perform to extremely well.

Sunday, October 28, 2007

The pleasure of control

I spent quite a bit of time today working on a portable testing panel for genetify. The effort was mostly unplanned and flowed from the thrill of building big, shiny buttons that control my creation. The really neat part is that the controls are inserted into any page from a remote script but one that is requested separately from genetify.js. The controls can be brought onscreen with a single function call but they don't weigh down the core code library. All this is for the convenience of the ultimate users of genetify -- developers.

Tuesday, October 23, 2007

Server side, no sweat

I added to the PHP server side library a few days ago (the blogger in me is slacking) and in the process I codified (excuse the pun) the universal, language-independent genetify API. In the abstract, it is something like this:


vary([callingContext]){
genes = getGenes(callingContext)
genome = _vary(genes)
save(genome)
}

_vary(genes){
for (gene in genes){
selected = selectVariant(gene)
genome[gene] = selected
callingContext[gene] = selected
}
return genome
}

save(genome){
saveToCookie(genome)
saveToDatabase(genome)
}


This general structure of the code is true for manipulating all types of object currently implemented -- PHP vars, Javascript objects, CSS rules, HTML elements. My hope is that this stays true as new languages are added -- Python, Ruby... who knows what else.

Friday, October 12, 2007

A lake of calm

I feel like a lake of calm now that I have successfully launched my first real live honest-to-goodness test of variants on my friend's ecommerce site. We will see from the data that is coming in how good genetify can be.

Friday, October 5, 2007

document.cookie sucks

So I was working last night on finishing a prototype PHP library for genetify. Everything was going great (it is amazing how fast you can re-write code, even in another language), until it came time to manipulate cookies client-side in Javascript that were set server-side in PHP. The cookie data was being read and passed on successfully to the rest of the system so my confidence in the PHP code was high. But when it came time to rewrite parts of the cookie data client-side -- the last test I wanted to try before calling it a night -- I ran into problems. Overwriting the server side cookie appeared to create a whole new cookie with the same name. Until this experience, I had always understood cookies to exists as keyed values in a namespace, altho represented by document.cookie as a string. So I tried to discover some sort of configuration of HTTP requests and client and server state that could reconcile the two cookies. As the night wore on, I spiraled down into a kind of mental labyrinth until I had the sense to give up.

Today, with a fresh start, I systematically tested my assumptions about cookies and I discovered after half an hour that Firefox's document.cookie prefixes the domain info in the cookie with a period "." , which apparently puts the cookie in a different namespace for writing (altho not reading!). It is no surprise that I missed it the night before. Can you spot the odd man out?

mydomain.com, mydomain.com, mydomain.com, .mydomain.com, mydomain.com

Not normally judgmental, I humbly name document.cookie a TERRIBLE API. Even before this epsiode, from my earliest days making websites, I hated it. Let's count all the sins:

1. Document.cookie is represented as a string, but the equals sign "=" appends to it. Why not "+=" ?

2. You can write to document.cookie with extra params (expires, domain, path) but these are not accessible anywhere in the DOM. Come on!

3. You can only delete a cookie by giving it an expiry in the past. Why not an empty string or null value?

4. The namespace problem described above. How to identify two cookies with the same name but different domains or paths?

5. All these problems could be avoided by making document.cookie a *normal* javascript-DOM object. Then every bloody javascript framework out there wouldn't need the functions setCookie and getCookie.

Wednesday, October 3, 2007

Looking for trouble

So I launched genetify a few days ago on a friend's site with it set to record goals only. It's working, but only partially. The irritating fact is that not all goals are being recorded, as shown by my friend's own records.

After checking all the browsers again, I couldn't think of anything else to try, so I decided to gather more info -- log all javascript errors to the database. I can see this feature sticking around. It would be a bold new step in my drive for reliable software -- complete responsibility of the client-side.

Monday, October 1, 2007

Ultimate and proximate goals revisited

I had a strange and wonderful idea today, sparked by discussions with friends about genetify. Sooner or later, someone will put so many genes in their pages that it will be hard to evaluate the effect of any single variant on a distant goal. To cope, the webmaster could create separate sub-goals for particular pages where appropriate. For example, knowing that a user who ultimately buys a product must pass thru the product's detail screen,
you could make the detail screen a sub-goal. But couldn't this whole process of creating sub-goals be systematized? Every page should in principle make some quantifiable contribution in the path to a goal. And couldn't genetify itself be used to discover these page-values?

Imagine every link on a page is genetified to comprise two variants: onclick="goal(1)" and onclick="goal(0)". This means that links that lead to pages with goals will have the "goal(1)" shown more frequently. In fact, "goal(1)" should eventually be shown in proportion to the average goal total reached on the following page. Therefore, critically, every page would also have a single goal fired by default whose value is equal to the sum of its "goal(1)" links. This means that the value of outgoing links is passed on to incoming traffic. Because this works across any two neighboring pages, an arbitrary number of pages could then be chained together in a value-chain. The real-world value of some far-away externally defined goal (think of a purchase) could be made to cascade thru a whole tree of supporting pages. Most important, at every node in the tree, the page-value would be proportional to its contribution to the ultimate goal.

Pretty crazy, huh?!

Thursday, September 27, 2007

Bring the pain

The main accomplishment of the last few days has been automated tests of all the high level functionality of genetify -- varying all types of genes, recording all the results, reading all the results and getting the correct variant probabilities. As it is before writing any test code, it seemed that my baby couldn't get any better and I was reluctant to start. However, experience has taught me that the moment you prove to yourself that something is working is absolutely the best time to start writing tests. And of course in writing my tests I found some hidden deficiencies. I fixed those and now I'm twice as sure that my baby is perfect!

Monday, September 24, 2007

Multi-domain, multi-page

I added the necessary key fields to the database. Nothing holding back a deployment now. Stay tuned.

Sunday, September 23, 2007

Proximate and ultimate goals

Today I wrote some code to store the last viewed genome in a cookie and read it out when recording a goal. This means goals can be on whole different pages from genetified code!

The problem with recording goals only on the same page as the thing being tested is that the thing being tested may have its most interesting effect several pages away. An ad may be on a page selling something, but the actual commitment to buy -- the click-of-no-return -- may be past a catalog page, past a sign-up page, in a confirmation page.

Building this feature has raised some interesting questions: Should genetified pages be evaluated on their own exclusive goals, or should pages be evaluated on site-wide goals? Or both? How complicated should the summation of goals be?

Why only store the last genome? I did it this way out of concern for the 4k limit. But imagine a record of all viewed genomes following the user around, getting some kind of proportional attribution every time a goal is reached. That would be some crazy learning machine!

Saturday, September 22, 2007

The living page

Every foot step you take on the ground leaves a footprint. Enough feet travel the same way and soon there's a path.

A web page should be the same way. Every click represents someone wanting to do something. A page should make it easier to follow the well-traveled path -- without some developer needing to lay down some metaphorical pavement.

The loop has been closed

I rushed ahead tonight to test genetify's first adaptive response. It works! And all too well--it got stuck on the very first logged genome because that genome instantly achieved a success rate of 100%.

Friday, September 21, 2007

Closing the loop

Between fine-tuning genetify.js, I have been working on closing the whole feedback loop. Combinations of gene-variants receive a score; individual gene-variants are evaluated across combinations; losing gene-variants are killed off; winning gene-variants are left to survive. Yesterday I got combinations of gene-variants -- what I've decided to call "genomes" -- recording their scores into Google Analytics and into my own database. Next, I'll be working on the evaluation step.

This is where the project is starting to get very rewarding. You can see the little genomes being born into the page, and you can rate their little lives with the arbitrary click of a mouse. Someone out there should find some fun with this!

Tuesday, September 18, 2007

Here it is

http://gregdingle.com/genetify.html
http://gregdingle.com/genetify.txt
http://gregdingle.com/genetify.js

Genetic variation in your browser.

Upgrades

Handles circular references.
Faster getElementsByClassName that uses XPath when available.
Unified gene naming scheme across CSS, javascript and HTML.

Last 10% always takes 90% of the time :)

Monday, September 17, 2007

IE blues

Hit a snag named Internet Explorer. Seems that although IE does register global variables in the window object as it should, it does not include them in the window object when iterating over it.
Who knew? (Seriously, I searched something like ten pages deep in Google.)

I worked around the limitation by grepping the contents of all script tags on the page for variable names. IE luckily does have a document.scripts property that makes that easy.

However, we're still left with a heartbreaking compromise on the original idea: Switching variable names in all loaded javascript except in IE where only variables or their parents that are referenced in the document are switched. I'm hoping that in practice this aberration won't inconvenience developers. Of all people, they should know that IE is usually the exception to the rule!

Sunday, September 16, 2007

Too excited

The first working prototype of the switcher is done. It lets you declare variants anywhere in HTML, CSS or Javascript that will then be switched randomly onload. Explanation to come.

Tuesday, August 28, 2007

Before we get too excited

I have to attend to other responsibilities.

The basic idea

...is this: A program that will adapt a website to its users by trial and error. It should be a systemization of something that every website creator/designer/owner already does -- making changes, seeing what effect they have, and keeping the beneficial ones. In biological terms, variation and selection. In engineering terms, an adaptive system. In statistical terms, Bayesian updating. In scientific terms, controlled experimentation. The core idea must be as old as common sense. The new part is making a computer do it for you in the environment of the web.

Monday, August 27, 2007

Ready... set... create!

Here I will try to document my thoughts and actions about my idea of a genetic algorithm for websites. Background to come.