s
Projects
Deepdreaming new trainees: Still fuzzy blobs

Follow-up from the previous one about training a new neural network to deepdream about new topics.

I had to let a larger training set of Art History run, in hopes that the expanded dataset would yield a more representational result. There were 2.3 million images, including photometric distortions. Again, the task was classifying which artist did which painting. This collection of images was 26 times larger than the one in my previous post. I only trained for 7 epochs (36 hours) before my spouse said enough was enough with the cloud computing bill and forced me to stop the instances. The resulting trainee seems to be producing more complex organic forms when deep dreamt but still nowhere near as sophisticated as the bvlc_googLeNet (which was stopped after 60 epochs).

Here are the resulting paintings/dreams generated with different settings, but using the same chromatic gradient image as a guide. The first using a lower resolution version of the guide image, fewer octaves, and only 10 iterations.

deepdream 2 art history

The second was 10 octaves with 40 iterations. You can see it really loves the skin color.

deepdream 2 art history

As I saw in some of the super duper high resolution (128 megapixel) puppyslug murals generated by David A Lobser, higher resolution isn't necessarily more interesting. I find the good stuff occurs in a lower resolution, perhaps an image that was roughly the size of one of the training images. Not a coincidence. Unfortunately, that's not so useful for people who want to make large murals.

The sweet spot I seek is the one where the deconv deep vis is showing activation images that resemble a scrambled version of the training images, because my theory is that it leads to a more representational synthesis in the deep dream. I continue on my weekends.

Anyway, more on kitsch. I'm not necessarily saying that kitsch is bad. I have a lava lamp in my house. What I think would be bad about deep learning + art is having a very promising scientific phenomenon trivialized and frozen in public memory as the software that performs the one single task. I think it has more potential than that, and I thank all of you who got in touch with me to say you agree.

deepdream training error graph

Deepdream: Avoiding Kitsch

Yes yes, #deepdream. But as Memo Atkin and others point out, this is going to kitsch as rapidly as Walter Keane and lolcats unless we can find a way to stop the massive firehose of repetitive #puppyslug that has been opened by a few websites letting us upload selfies. I don't think we should stop at puppyslug (and its involved intermediary layers), but training a separate neural network turns out to be more technically difficult for most artists. I believe applying machine learning in content synthesis is a wide open frontier in computational creativity, so let's please do what we can to save this emerging aesthetic from its puppyslug typecast. If we can get over the hurdle of training brains, and start to apply inceptionism to other media (vector based 2D visuals, video clips, music, to name a few) then the technique might diversify into a more dignified craft that would be way harder to contain within a single novelty hashtag.

Why does it all look the same?

Let's talk about this one brain everyone loves. It's a bvlc_googLeNet trained on ImageNet, provided on the caffe model zoo. That's the one that gives us puppyslug because it has seen so many dogs, birds, and pagodas. It's also the one that gives you the rest of the effects offered by dreamscopeapp because they're just poking the brain in other places besides the very end. Again, even the deluxe options package is going to get old fast. I refer to this caffemodel file as the puppyslug brain. Perhaps the reason for all the doggies has to do with the number of dog pictures in ImageNet. Shortly following is a diagram of the images coming from different parts of this neural network. You can imagine its thought process like a collection of finely tuned photoshop filters, strung together into a hierarchical pipeline. Naturally, the more complex stuff is at the end.

network visualization

What's the Point?

My goal in this post is to show you some deepdream images that were done with neural networks trained on other datasets – data besides the the entirety of ImageNet. I hope that these outcomes will convince you that there's more to it, and that the conversation is far from over. Some of the pre-trained neural nets were used un-altered from the Caffe Model Zoo, and others were ones I trained just for this exploration.

Art

It's important to keep in mind that feeding the neural net next to nothing results in just as extravagant of output as feeding it the The Sistine Chapel. It is the job of the artist to select a meaningful guide image, whose relationship to the training set is of interesting cultural significance. Without that curated relationship, all you have is a good old computational acid trip.


The following image is a chromatic gradient guiding a deep-dream by a GoogLeNet trained on classical Western fine art history up to impressionism, using crawled images from Dr. Emil Krén's Web Gallery of Art. This version uses photometric distortion to prevent over-fitting. I think it results in more representational imagery. The image is 2000x2000 pixels, so download it and take a closer look in your viewer of choice.

deepdream by arthistory1 neural net

This one is the same data, but the training set did not contain the photometric distortions. The output still contains representational imagery.

deepdream by arthistory1 neural net

The below image is a neural network trained to do gender classification, deepdreaming about Bruce Jenner, on the cover of Playgirl Magazine in 1982. Whether or not Bruce has been properly gender-classified may be inconsequential to the outcome of the deepdream image.

High Resolution Generative Image

Notice that when gender_net is simply run on a picture of clouds, you still see the lost souls poking out of Freddy Krueger's belly.

High Resolution Generative Image

Gender_net deepdreaming Untitled A by Cindy Sherman (the one with the train conductor's hat).

High Resolution Generative Image


This was a more intermediary layer from deep-dreaming a neural network custom trained to classify various labeled faces in the wild (LFW).

High Resolution Generative Image

This was dreamt by the same neural net, but using a different gradient to guide it. The resulting image looks like Pepperland.

High Resolution Generative Image

This is the same face classifier (innocently trying to tell Taylor Swift apart from Floyd Mayweather) guided by a linear gradient. The result is this wall of grotesque faces.

High Resolution Generative Image

Just for good measure, here's hardcore pornography, deep-dreamt by that same facial recognition network, but with fewer fractal octaves specified by the artist.

High Resolution Generative Image

Technical Notes

Training neural networks turned out to be easier than I expected, thanks to public AMIs and nvidia digits. Expect your AWS bill to skyrocket. Particularly if you know about machine learning, it helps to actually read the GoogLeNet publication. In the section called Training Methodology, that article mentions photometric distortions by Andrew Howard. This is important not to overlook. When generating the distortions, I used ImageMagick and python. You can also generate the photometric distortions on the fly with this Caffe fork.

If you want to bake later inception layers without getting a sizing error, go into deploy.prototxt and delete all layers whose name begins with loss. In nvidia digits, the default learning rate policy is Step Down but bvlc_GoogLeNet used Polynomial Decay with a power of 0.5. I can't say that one is necessarily better than the other since I don't even know that properly training the neural net to classify successfully has anything to do with its effectiveness in synthesizing a deepdream image.

The highest resolution image I could train on the greatest ec2 instance turned out to be 18x18 inches at 300 dots per inch. Any more than that and I would need more than 60 gb of RAM. If anyone has access to such a machine, I would gladly collaborate. I also seek to understand why my own training sets did not result in such clarity of re-sythesis in the dreams. It's possible I simply did not train for long enough, or maybe the fine tweaking of the parameters is a matter more subtle. Please train me!

Black Eyed Peas: Boom Boom Pow

by Keith

by Keith

Code was written at Motion Theory to generate content and effects in the Black Eyed Peas' Boom Boom Pow video. Amongst this highly collaborative effort were two other code artists: Keith Pasko and Ryan Alexander. After the video was finished, Keith and I went on to collaborate again on creating custom VJ software used by VSquared Labs in the Black Eyed Peas live tour. The VJ software processes a realtime video feed, and was written in C++ OpenFrameworks. The keyboard was filled with different behavior and content controls. If we had more time, we would have connected it to a Lemur device communicating via OSC. Once again a job well done with Motion Theory, and the beginnings of a relationship with VSquared Labs.

by Keith

by Keith

by Keith

by Keith

by Keith

by Keith

EAZ AR/MR Workshop

Watch on Vimeo.

Report: FACT HCI Augmented Reality Workshop, Success.

In early November 2005, a forty minute educational workshop experience for 300 pupils age 10-11 was given for the Liverpool FOCUS EAZ (Education Action Zone) for schools in the Fazakerley and Walton areas. The technology introduced was addressed at various times as "Augmented Reality," "Camera Vision," "Gestural Interface," and "Illuminating Lamp." Pupils were invited to step into a silhouette-driven environment and interact with six demonstration programs in a game play mode. Additionally, a five minute slideshow was given by the team (Karen Hickling, Josh Nimoy, Marta Ruperez) giving brief backgrounds on FACT (Foundation for Art and Creative Technology) and HCI (Human Computer Interaction group at John Moores University). In preparation for the workshop, Josh had authored (or massaged previously authored) software into an automatic shuffle of six abstract games. Some of the games were more multiplayer than others. Those which were not multiplayer presented limited resource for interactive satisfaction - making great exercises in collaboration, teamwork, and turn-taking for the pupils. The workshop was given a total of ten times. This essay reports on things we learned about the workshop and the HCI as we refined the workshop's structure.

In the planning stages of the workshop, Josh was initially unsure that the camera interface would be appropriate for the age group but proposed it to a few people at FACT. Artist, Carlos "Caen" Botto came back with a strongly positive response, saying, "This kind of activity is very appropriate. The most important needs at this age is that the system have a very evident reactivity (think that at this age the abstract thinking is in early formation), letting the children experience a sensation of control in a very direct way. The other important need is for high physical activity. I think this proposal is right in both aspects. The group size is not a problem. It is important not only to interact, but also to see others interacting with the system. A big group can be broken into three shifts, When one group is interacting, the other groups can observe. On the pedagogic side, I think the educative aspects are centred in psychomotricity, creativity development, cooperation and intuitive problem solving, et cetera." Just before software development, a local toy store was visited in curiosity about which products were targeted for this age range - in hopes to keep the experience from being boring, embarrassing, or too complex for the students. Similar conclusions were reached as Caen's mentions of creative development, psychomotricity, and high physical activity. These children were going to be more energetic than us!

The order of the workshop was originally planned to begin with a 35 minutes of play, with a five minute presentation at the end. The rational for this was so the students would be engaged first, generating questions in their heads. In retrospective analysis, there had been too much preparation to entertain than was actually needed. We soon realised in implementation that workshops lasted for unpredictable durations, sometimes cutting the presentation completely off from the experience. Since the presentation was a relatively important component, it was moved ahead to precede the play session. This change turned out to be beneficial as well. Ideas were put into the heads of the students pre-play, so the play sessions could be more than just play. It was informed play. This was not the only response to chaos. Besides the duration of workshops varying, it was also hard to position ten children in the projection -- it would completely block the projection and although the students were having fun, we wanted them to experience the software in a way that would allow them to comprehend the interactivities. We began to break the groups into three or four, and segmenting each software program into 1-2 minute turns, calling out each group and keeping time. Before we bothered to do this, the turn taking emerged naturally from the group behaviours. It was just faster to impose this early on for punctuality.

We gave very little instruction, and just let the students do what they did. Each group's behaviour evolved in similar ways. As the students discovered the systems, one student would back up so far that the projection was blocked completely by his (it was usually male) body, preventing the rest of the children from interacting with the systems. In the more self-governed clusters, this would result in that person being yelled at by the spectators to "move forward." On the second day, a clear barrier was drawn so that no one could back up too far. This architectural restriction was somewhat of a solution for the "block all" personality type - although this impulse was still observed in students within the remaining space. Students would also pound on the wall where the image was projected, treating the virtual objects as if they were buttons, or as if there were sensors in the walls. The only reason we introduced a rule against the wall pounding was in respect to the workshop happening just in the next room. Other than disturbance, there didn't seem to be any problem with pounding the wall. While on the subject of violence, it was interesting to note the playful violence between boys. Often, when in the shadow space, boys would pretend-fight, as in karate or boxing. This was probably not just due to the high energy levels of the activity, but the expectations that come along with any game-play mentality. The boys felt as though they were "inside the videogame" and were acting accordingly - creating the same scenes they had seen in Street Fighter and Mortal Kombat. Violence was one kind of cheap laugh among the several categories of cheap laughs discovered by the students while in play. On the first day, there was no teacher accompanying the groups. This caused them to be less obedient. However, even if on the second day, there was at least one teacher governing the students, the "play violence" persisted with unusual strength.

A special learning group came in - a smaller group with a wide range of conditions. In this case, the most immediate software worked (the sparkling star trails). The initial concept presentation was cut very short and more time was given to play. This group reacted more or less the same as the other classes of children. It is entertaining to note the democratising power of Augmented Reality: everyone seems to act like a ten-year-old child when they try it out. By the time we got to the last group, students already knew what to expect. Rumours were being passed between friends during breaks and lunchtime. The technology turns out to be not as new and shocking as predicted. Each group was asked to raise hands if someone had an "EyeToy" at home. Three to ten children would always raise their hands, completely aware of what was about to come. On the other hand, slides showing Tom Cruise in Minority Report doing gestures in front of his pre-crime computer-cave went virtually unrecognised. These children were too young to be allowed into the cinema to view this movie with that rating, despite its being responsible for disseminating the idea of AR so widely. The popular question asked was "how does this work?" The next most popular question was "Is this an EyeToy?" Josh's favourite question came from a special learning student: "Are you from America?"

Previously, an emerged personality type was mentioned - the boy who blocked the entire screen. Other such personality types reoccurred during the workshops. The "geek" type would get bored of the demonstration and sit in the back of the room with Josh and the computers, talking about more advanced topics like programming (example question: "Did you do code this in HTML using Notepad?"). The non-participant would sit in the chair, or on the floor and refuse to get up and join the play, even after being prodded by Karen or the teacher. Conversely, and more frequent was the over-active participant -- a child who did not have enough patience to wait for his/her group's turn, would have trouble leaving the game space when the turn was over, and would be found sticking arms and legs into the projector from the perimeters during other groups' turns. Somewhat related to this over-active participant was the shouting director, trying to verbally control who ever was in the ring - telling them to try different positions or interactions. For the most part, children listened to the directors. In the end, we fully realise that it really did not matter what software was running. The most entertaining part of the experience was students being allowed (for once) to let loose in a projection-obstructing frenzy.

Teachers sitting and watching the experience seemed overwhelmed. While only a few of them stepped in to try the interface, everyone had something positive to say. "It crosses the whole curriculum" said one teacher, in reference to the joining of arts and sciences that is the augmented reality field. "You have got a mixed group of people co-operating and working together. Boys mixing with girls and children who do not know each other" said another teacher, amazed at how the experience seemed to break down barriers between different kinds of students.

slow fade grid

slow fade grid

screen capture of a pen-trail

screen capture of a pen-trail

stars

stars

stars

stars

stars

stars

stars

stars

squishy satsumas

squishy satsumas

slow fade grid

slow fade grid

stars

stars

Josh coding on laptop

Josh coding on laptop

exploding watermelons

exploding watermelons

scribble pen

scribble pen

more scribble pen

more scribble pen

scribble pen again

scribble pen again

multi scribble

multi scribble

fade grid

fade grid

squashy satsumas

squashy satsumas

exploding melons

exploding melons

kalmac camera processing

A series of 7 camera view treatments that I probably did at ITP. When I went to Fabrica, I shared my source code with Joel Gethin Lewis and he quickly understood it. Shortly after, Joel went on to work with United Visual Artists. To interact with this piece, move the mouse to tumble the 3D form being generated from the webcam image.

Froggies

Froggies is a play-testing prototype for a children's digital play environment. A table with a screen embedded into the surface acts as an arena for virtual life. Children react with the virtual life by placing various symbolic markers on the table and sliding them around. The result is a musical rythm of animal noises.

Benetton / Fabrica : Rippling

The Interactive Media Department of Fabrica at Benetton was lead by Andy Cameron of Antirom and Romandson fame. I managed to create a few media experiments targeting retail storefront window displays - getting the attention of passers by. This is a screenshot of a video application that records camera input into a history buffer - starting at the top left and streaming across the screen with carriage returns. The application is meant to run on a plasma display and has a medium sized configuration system, allowing people to tweak important design parameters.

BallDroppings

Balldroppings was one night of idle programming that blew up unexpectedly into a web phenomenon. I learned that simplicity is elegant, and C++ is wonderful for low-latency sound+image. I also learned about addiction and glucose metabolism rate highs. Although I do not accredit myself for having originated the idea of interactive lines with bouncing balls, there exists a small following in the online gaming community that gives me such credit, particularly when accusing one another of having copied me in their recent developments. BallDroppings has also been re-implemented in other languages by random people, referencing the name "BallDroppings." All this activity is very surprising to me. It is also a clear example of the great power resulting from refraining to mark intellectual property. A lot of people mistook BallDroppings to be my graduate thesis. I don't try to correct this misunderstanding.

Worm: Robot you can scare

This robotic worm moves around and bows at who ever gets close to it. The piece was received by the audience as a strange pet. No artificial intelligence was involved, but people grew attached to the unpredictable personality, controlled by the unpredictable factors in my camera vision software. The robot was a model for a larger installation entitled 'garden of lovers' in which a 5x5 grid of life-size robotic worms would dance together in response to who ever was walking around amongst them. This was a proposal (which i am still looking for) and this single interactive prototype.

I built for "Interactive Multimedia" in the UCLA Hypermedia Studio, within the dept. of FilmTV, under the instruction of Jeff Burke, Jason Brush, and Fabian Wagmeister.

Alone

Alone

Responding

Responding

Naked

Naked

  • Page 1 of 2
  • Page 1 of 2