Deepdreaming new trainees: Still fuzzy blobs

Follow-up from the previous one about training a new neural network to deepdream about new topics.

I had to let a larger training set of Art History run, in hopes that the expanded dataset would yield a more representational result. There were 2.3 million images, including photometric distortions. Again, the task was classifying which artist did which painting. This collection of images was 26 times larger than the one in my previous post. I only trained for 7 epochs (36 hours) before my spouse said enough was enough with the cloud computing bill and forced me to stop the instances. The resulting trainee seems to be producing more complex organic forms when deep dreamt but still nowhere near as sophisticated as the bvlc_googLeNet (which was stopped after 60 epochs).

Here are the resulting paintings/dreams generated with different settings, but using the same chromatic gradient image as a guide. The first using a lower resolution version of the guide image, fewer octaves, and only 10 iterations.

deepdream 2 art history

The second was 10 octaves with 40 iterations. You can see it really loves the skin color.

deepdream 2 art history

As I saw in some of the super duper high resolution (128 megapixel) puppyslug murals generated by David A Lobser, higher resolution isn't necessarily more interesting. I find the good stuff occurs in a lower resolution, perhaps an image that was roughly the size of one of the training images. Not a coincidence. Unfortunately, that's not so useful for people who want to make large murals.

The sweet spot I seek is the one where the deconv deep vis is showing activation images that resemble a scrambled version of the training images, because my theory is that it leads to a more representational synthesis in the deep dream. I continue on my weekends.

Anyway, more on kitsch. I'm not necessarily saying that kitsch is bad. I have a lava lamp in my house. What I think would be bad about deep learning + art is having a very promising scientific phenomenon trivialized and frozen in public memory as the software that performs the one single task. I think it has more potential than that, and I thank all of you who got in touch with me to say you agree.

deepdream training error graph

Deepdream: Avoiding Kitsch

Yes yes, #deepdream. But as Memo Akten and others point out, this is going to kitsch as rapidly as Walter Keane and lolcats unless we can find a way to stop the massive firehose of repetitive #puppyslug that has been opened by a few websites letting us upload selfies. I don't think we should stop at puppyslug (and its involved intermediary layers), but training a separate neural network turns out to be more technically difficult for most artists. I believe applying machine learning in content synthesis is a wide open frontier in computational creativity, so let's please do what we can to save this emerging aesthetic from its puppyslug typecast. If we can get over the hurdle of training brains, and start to apply inceptionism to other media (vector based 2D visuals, video clips, music, to name a few) then the technique might diversify into a more dignified craft that would be way harder to contain within a single novelty hashtag.

Why does it all look the same?

Let's talk about this one brain everyone loves. It's a bvlc_googLeNet trained on ImageNet, provided on the caffe model zoo. That's the one that gives us puppyslug because it has seen so many dogs, birds, and pagodas. It's also the one that gives you the rest of the effects offered by dreamscopeapp because they're just poking the brain in other places besides the very end. Again, even the deluxe options package is going to get old fast. I refer to this caffemodel file as the puppyslug brain. Perhaps the reason for all the doggies has to do with the number of dog pictures in ImageNet. Shortly following is a diagram of the images coming from different parts of this neural network. You can imagine its thought process like a collection of finely tuned photoshop filters, strung together into a hierarchical pipeline. Naturally, the more complex stuff is at the end.

network visualization

What's the Point?

My goal in this post is to show you some deepdream images that were done with neural networks trained on other datasets – data besides the the entirety of ImageNet. I hope that these outcomes will convince you that there's more to it, and that the conversation is far from over. Some of the pre-trained neural nets were used un-altered from the Caffe Model Zoo, and others were ones I trained just for this exploration.


It's important to keep in mind that feeding the neural net next to nothing results in just as extravagant of output as feeding it the The Sistine Chapel. It is the job of the artist to select a meaningful guide image, whose relationship to the training set is of interesting cultural significance. Without that curated relationship, all you have is a good old computational acid trip.

The following image is a chromatic gradient guiding a deep-dream by a GoogLeNet trained on classical Western fine art history up to impressionism, using crawled images from Dr. Emil Krén's Web Gallery of Art. This version uses photometric distortion to prevent over-fitting. I think it results in more representational imagery. The image is 2000x2000 pixels, so download it and take a closer look in your viewer of choice.

deepdream by arthistory1 neural net

This one is the same data, but the training set did not contain the photometric distortions. The output still contains representational imagery.

deepdream by arthistory1 neural net

The below image is a neural network trained to do gender classification, deepdreaming about Bruce Jenner, on the cover of Playgirl Magazine in 1982. Whether or not Bruce has been properly gender-classified may be inconsequential to the outcome of the deepdream image.

High Resolution Generative Image

Notice that when gender_net is simply run on a picture of clouds, you still see the lost souls poking out of Freddy Krueger's belly.

High Resolution Generative Image

Gender_net deepdreaming Untitled A by Cindy Sherman (the one with the train conductor's hat).

High Resolution Generative Image

This was a more intermediary layer from deep-dreaming a neural network custom trained to classify various labeled faces in the wild (LFW).

High Resolution Generative Image

This was dreamt by the same neural net, but using a different gradient to guide it. The resulting image looks like Pepperland.

High Resolution Generative Image

This is the same face classifier (innocently trying to tell Taylor Swift apart from Floyd Mayweather) guided by a linear gradient. The result is this wall of grotesque faces.

High Resolution Generative Image

Just for good measure, here's hardcore pornography, deep-dreamt by that same facial recognition network, but with fewer fractal octaves specified by the artist.

High Resolution Generative Image

Technical Notes

Training neural networks turned out to be easier than I expected, thanks to public AMIs and nvidia digits. Expect your AWS bill to skyrocket. Particularly if you know about machine learning, it helps to actually read the GoogLeNet publication. In the section called Training Methodology, that article mentions photometric distortions by Andrew Howard. This is important not to overlook. When generating the distortions, I used ImageMagick and python. You can also generate the photometric distortions on the fly with this Caffe fork.

If you want to bake later inception layers without getting a sizing error, go into deploy.prototxt and delete all layers whose name begins with loss. In nvidia digits, the default learning rate policy is Step Down but bvlc_GoogLeNet used Polynomial Decay with a power of 0.5. I can't say that one is necessarily better than the other since I don't even know that properly training the neural net to classify successfully has anything to do with its effectiveness in synthesizing a deepdream image.

The highest resolution image I could train on the greatest ec2 instance turned out to be 18x18 inches at 300 dots per inch. Any more than that and I would need more than 60 gb of RAM. If anyone has access to such a machine, I would gladly collaborate. I also seek to understand why my own training sets did not result in such clarity of re-sythesis in the dreams. It's possible I simply did not train for long enough, or maybe the fine tweaking of the parameters is a matter more subtle. Please train me!

Teaching Openframeworks

Video guiding the Processing coder through the mental transformation needed to start using OpenFrameworks to create equivalent work. This class was taught with Syed Reza Ali. We will post more episodes every time we teach this GAFFTA class. So far, I have shaved my face with a straight razor, did a claymation about pointers, and took apart a McDonalds cheeseburger. I hope it makes learning programming and its dry concepts a bit wetter. Special thanks to oooShiny for advising on the materials and teaching approach.

jttoolkit to OpenFrameworks

Jttoolkit is a C++ system for Processing artists. Actually, OpenFrameworks is probably going to do a much better job at this than me, so let's all get on that boat now. And for Gabe, it was really more about the cleanly managed dependencies. Cygwin and MacPorts have been absolutely a headache for us, and for students. So I wrote this tutorial guiding you how to migrate your jttoolkit apps into openFrameworks. Be free! Be free!

kalmac camera processing

A series of 7 camera view treatments that I probably did at ITP. When I went to Fabrica, I shared my source code with Joel Gethin Lewis and he quickly understood it. Shortly after, Joel went on to work with United Visual Artists. To interact with this piece, move the mouse to tumble the 3D form being generated from the webcam image.


A game demo where the player changes the terrain by laying down blocks, then presses go and watches the avatar walk forward, bumping into walls, and eventually reaching the heart and completing the level. When I first began writing this, I was trying to create a game that introduces basic computer programming concepts to the player through architecture and symbols. I still haven't found a good network of metaphors to use -- particularly in the area of working with numbers and math and trying to make it not so mathy. This project almost became my graduate thesis. I later went on to the Icon==Function project and I failed at making those audio visual environments turing complete as well. With Teamwork, I got so wrapped up in the fun of designing a retro-esque 2D side scroller that I almost lost track of my regular school work. I hope to some day pick back up on making an accessible visual programming system.

Processing Tutorial for Flash/Lingo-ers

When I was a first year at ITP, I did a lot of social engineering in order to bring open source culture to attention. I also wished to support the efforts of some friends in another research program. I taught early releases of Processing to the students and faculty (and current faculty who were students then), and I also chose to use Processing to create a number of projects. My greatest propegandistic vessel was the "Proce55ing Workshop" (in 2002, two fives were being used in the name). My workshop was a precursor to the modern-day-ITP "drive-by seminar." I prepared a 3 hour performative coding demonstration, with a light discussion on open source. I designed my class hand-out to be not just a give-away, but something they might use more permanently. If GUI file viewing interfaces offer a user things like sort-by-date, sort-by -name, and sort-by-size, I was offering the community of Processing-incomers a reference that was sort-by-macromedia. We could reuse the knowledge the students already had and give them a fresh perspective on the debate of authorship and capitalism. By Fall Semester of 2004 (after I graduated), Processing was adopted by the NYU ITP Introduction to Computational Media course series as its primary teaching tool. My "tutorial for macromedia minds" has been translated to Japanese, and linked to by educators around the world. I am happy to have made some sort of contribution. Original Date: Saturday - Feb. 8, 2003 721 Broadway, Floor 4, Room 406 1:00 pm to 4:00 pm


JTNGSE was a graphics editing program that was half GUI, half scripting. I originally wrote it for undergraduate friends in the UCLA design department because I was sad to hear that John Maeda's awesome Design by Numbers "did not save" or "did not support color." I also saw that Processing was wonderful, but also discouraging for my peers to learn (at that time, they considered it too complicated). Basically, everyone seems to need something that integrates in a very obvious way with their software packages. This lack of easy integration is what my friends claim is holding them back from beginning to learn this sort of process. JTNGSE was one attempt. Since that time, I have come to believe that the problem is a bit more complex.

Special Thanks to Debrah Isaac for using it in her senior project, Gabe Dunne for testing.

Scribble Variations
Grad Lingo Workshop

I taught a graduate workshop in Lingo programming, in conjunction with Maria Redin's Physical computing workshop. She taught them sensors, circuitry, and EZIO programming. I taught them the basics of programming, on-screen animation, and I made myself available for a couple months to answer their emails. Some of the examples from that session were later adopted by Jennifer Steinkamp and made into an "examples" site for undergraduate students.