Sunday, May 23, 2010

Google's AI engine

Well, it looks like Google is trying to help others use its own machine learning algorithms. At Google I/O this week, Google announced their new Google Prediction API. It promises to help predict the future based on past data. I'm not so sure about that, but it is very cool to see cloud-based services such as this one to further the AI field.

It sounds like the prediction API is mostly focused on looking at a ton of old data, and based on patterns found in that data, answer questions about how the data may look in the future. It remains to be seen if more current data can be fed in, and as the system learns more and more, does it get better at predicting? Or, does the data need to be fed in all at once, run a huge learning algorithm over it (taking a very long time to complete), and then repeat that every time you get new data?

I'll have to play around with it and see how it works before I pass judgment, but either way, it is a boon for AI in general!

Wednesday, January 20, 2010

Neural Networks on Google App Engine

I've been reading about the Google App Engine more over the past few days. I set up an account a long time ago, but hadn't really had much of an opportunity to try it out. So, I did the logical thing and walked through the quickstart tutorial. It was pretty good, and produced something better than your typical "Hello World" programming example. That's a good thing since they wouldn't really be able to show some of the more powerful features with such a simple example.

Recently I wrote a pretty simple (read: crappy) neural net in Java, and I was trying to figure out a way to move it to a server environment so I could get more scalability and power than my iMac. I thought about using AWS, but the costs involved are pretty steep, and I'm not just talking about money here: environment setup, getting an engine running, etc. So, I decided to take a look at the App Engine setup and see if I could actually build something there to run my node network.

The first roadblock I found is that App Engine does everything by URL requests. That could work, as I could just kick off the whole thing from a web front end, but it seems a bit hokey if I just want the network to be running all the time. I did find the ability to create cron jobs and put tasks on a queue, but I'll get to my analysis of that in a minute.

There are two big limitations with the App Engine that prevent kicking off an engine from a URL request: 30-second response generation time limit and can't start additional threads. The former isn't a problem if you can do the latter. Obviously you don't really want a web request to take a long long time, so it makes absolute sense for a 30-second time limit. This, in fact, is a huge time for a user to wait for a page to load or other interaction to get handled, so I would argue no web application should take more time than that for a typical request/response. So, this really isn't a problem per se. I didn't want to go down that approach anyway. I really just wanted the URL to kick off the engine and return saying "yep, kicked it off!"

The second issue is bigger problem for typical parallel processing. If I can't kick off a daemon thread which can kick off its own set of worker threads, then the engine dies when the request is processed, until another request comes in. So, I'm really limited to 30-second chunks of work, and those chunks can't easily kick off other chunks of work.

Or can they?

I then learned about cron jobs and task queues. Cron jobs are basically just scheduled tasks. It's named after a *nix program used for scheduling tasks to be run on a schedule, or at a specific time. The App Engine version of cron allows a URL to be called at a particular time or on a schedule, thereby kicking off a request. It sounds interesting, but only if I want to kick off the engine every minute or so. Again, this wouldn't necessarily be a problem if the engine could run for more than 30 seconds at a time. However, URLs called by cron jobs are limited by the same thing as any other request: 30 seconds and no starting threads. That seems like it's going to be a bit slow for my needs, and not really what cron should be used for in my opinion — not at every minute anyway.

The task queue holds more promise. Your application can add tasks to the task queue, and they will get processed by the task queue engine. Tasks are basically URL requests, just like cron jobs, so they are under the same limitations there, but depending on load, tasks can be handled at more like 5 or 10 per second (throttled at up to 20 max). This is more reasonable since each request could take a second or two to process 1000 neurons or so, and then go away. Even at only 20 tasks per second, that's 20,000 neurons handled per second as opposed to 10,000 per minute. This is still limiting, but not quite as much. The kicker? There is a limitation of only 100,000 tasks per day, which means we would run out in about an hour. That's the limit on free accounts, but paid accounts are still limited to a million, so about 10 hours of meaningful work unless it's throttled down, but then fewer neurons get handled at a time. The docs say that this feature is experimental, so maybe they're increase quotas, but the limits are still too low to handle a meaningful number of neurons, especially as it scales.

Next, I will look into using MapReduce — possibly using AWS Elastic MapReduce — for handling some of the core neural engine requirements. From what I've read, I'm not sure it's the right parallelization technique for this, but it should make for some interesting reading.

I'm willing to pay money for the hardware use — it's the setup time that I'm trying to avoid. What we need is an online neural net engine as a service. Maybe that's what I'll end up building in the end.

Consumer/Producer

My how times change. It's been forever since I've sat down and written anything much longer than 140 characters. Even if it's just to introduce a video or pictures of my son, it's still not very long.

I'm going to remedy that. I've been in consume mode for far too long, it's time for me to produce.

Thursday, August 04, 2005

Article: CMU online game will be used to help teach computers to see

CMU online game will be used to help teach computers to see. This is pretty cool. Basically, it's a game where one person tries to get another player across the internet to describe a picture, but the person describing doesn't get to see all of the picture at once.

The real point is to get enough pictures of objects at different angles with descriptions so that a computer can look at them all and learn what objects are by sight.

The only problem I see with this approach is that they'd need billions of images to make this worthwhile. For example, if you assume our eyes take in approximately 30 frames per second, at 60 seconds a minute, 60 minutes an hour, and 8 hours a day (average for a baby in the first year of their life), then in a single day, a baby sees 864,000 impressions of objects in the environment. Multiply that by 365 and you get 315 million impressions in the first year of life. So, you can imagine how many images are going to be necessary to teach a computer to do the same thing. I'd say, use video instead of static images.

Wednesday, August 03, 2005

Article: Bush wants alternatives to Darwinism taught in school

Bush wants alternatives to Darwinism taught in school -- for once I agree with something that President Bush says: "I think that part of education is to expose people to different schools of thought."

Profound, yet something that most schools don't do well. However, if you're going to open the schools up to different schools of thought, why stop at just adding Creationism to the curriculum? I see they're trying to call it "the" theory of Intelligent Design now, but what about other theories of intelligent design? Or maybe there are alternative evolutionary theories to Darwinism that we should discuss.

For example, what about aliens? I mean, couldn't aliens have created the universe as we know it? Isn't that "Intelligent Design"? Not exactly what Creationists are calling intelligent design, but worth considering. Shouldn't we tell our kids about that? How about a Human Endogenous Retrovirus (HERV) that punctuates evolution at times of stress in the population. This comes straight out of a series by Greg Bear, Darwin's Radio and Darwin's Children. There is even talk in that book about intelligence in the system, nature as a neural network of sorts.

Now, these ideas may seem crazy, but no more crazy than evolving from monkeys (where's the missing link?) or being thought up by some omniscient and omnipotent being, affectionately known as God to most of the western world. These ideas are just as crazy to the other group. My point is that I agree, we should teach many different viewpoints, and maybe even have debates in high schools.

Provoking independent thought in children is so very important, because then they won't be corporate drones in the future. And maybe they'll think so much they'll actually vote someone good into the presidency.