From charlesreid1

 
(10 intermediate revisions by the same user not shown)
Line 3: Line 3:
Adversarial neural networks use an architecture consisting of two separate neural networks - one network attempts to learn how to accomplish a task, and another network attempts to differentiate between the output of the first network and the "real" output.
Adversarial neural networks use an architecture consisting of two separate neural networks - one network attempts to learn how to accomplish a task, and another network attempts to differentiate between the output of the first network and the "real" output.


=TensorFlow Examples of Adversarial Neural Networks=
=TensorFlow Adversarial Examples=


==Adversarial Crypto==
==Adversarial Crypto==
Line 9: Line 9:
This adversarial crypto neural network attempts to learn how to protect communications using the adversarial architecture.
This adversarial crypto neural network attempts to learn how to protect communications using the adversarial architecture.


Link to paper: "Learning to Protect Communications with Adversarial Neural Cryptography": https://arxiv.org/abs/1610.06918
Paper: "Learning to Protect Communications with Adversarial Neural Cryptography"
 
Link to paper: https://arxiv.org/abs/1610.06918


Link to code: https://github.com/tensorflow/models/tree/master/research/adversarial_crypto
Link to code: https://github.com/tensorflow/models/tree/master/research/adversarial_crypto
Line 27: Line 29:
===The Model===
===The Model===


We'll step through the code line-by-line again. Here's the link to the code: https://github.com/tensorflow/models/blob/master/research/adversarial_crypto/train_eval.py
We'll step through the code line-by-line. Here's the link to the code: https://github.com/tensorflow/models/blob/master/research/adversarial_crypto/train_eval.py


====License====
Full model walkthrough is on the [[TensorFlow/Adversarial Crypto]] page.


Obligatory license info:
The rundown is:
* Create an AdversarialCrypto class that holds a training optimizer object for the Bob and Alice networks
* Define a method that evaluates the networks as-is and prints the percent losses
* Define a method that trains the network for a specified number of iterations, stopping early if the network reaches its target losses
* Define a method that calls the training function (above), then re-trains Eve several more times from scratch


<pre>
==Adversarial Text==
# Copyright 2016 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
</pre>
 
Some info about the network:
* There are actually 3 neural networks involved: Alice, Bob, and Eve
* Alice takes inputs in_m (message), in_k (key) and outputs the ciphertext
* Bob takes inputs in_k (key), ciphertext and attempts to output the plaintext
* Even takes input ciphertext (no key) and also attempts to output the plaintext
 
The file starts with imports/declarations to be compatible with Python 3:
 
<pre>
# TensorFlow Python 3 compatibility
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import signal
import sys
from six.moves import xrange  # pylint: disable=redefined-builtin
import tensorflow as tf
</pre>
 
====Input Argument Flags and Parameters====
 
Hyperparameter flags can be set on the command line:
 
<pre>
flags = tf.app.flags
flags.DEFINE_float('learning_rate', 0.0008, 'Constant learning rate')
flags.DEFINE_integer('batch_size', 4096, 'Batch size')
FLAGS = flags.FLAGS
</pre>
 
The FLAGS stuff does not seem to be defined anywhere in the documentation, so the usage is not clear here. But, as an author on TF project states [https://stackoverflow.com/questions/33932901/whats-the-purpose-of-tf-app-flags-in-tensorflow#33938519 here], it is intended to make demos more convenient, and essentially wraps argparse.
 
Also see [[TensorFlow/Command Line Args]].
 
More parameter definitions follow:
 
<pre>
# Input and output configuration.
TEXT_SIZE = 16
KEY_SIZE = 16
 
# Training parameters.
ITERS_PER_ACTOR = 1
EVE_MULTIPLIER = 2  # Train Eve 2x for every step of Alice/Bob
# Train until either max loops or Alice/Bob "good enough":
MAX_TRAINING_LOOPS = 850000
BOB_LOSS_THRESH = 0.02  # Exit when Bob loss < 0.02 and Eve > 7.7 bits
EVE_LOSS_THRESH = 7.7
 
# Logging and evaluation.
PRINT_EVERY = 200  # In training, log every 200 steps.
EVE_EXTRA_ROUNDS = 2000  # At end, train eve a bit more.
RETRAIN_EVE_ITERS = 10000  # Retrain eve up to ITERS*LOOPS times.
RETRAIN_EVE_LOOPS = 25  # With an evaluation each loop
NUMBER_OF_EVE_RESETS = 5  # And do this up to 5 times with a fresh eve.
# Use EVAL_BATCHES samples each time we check accuracy.
EVAL_BATCHES = 1
</pre>
 
 
====Batch of Random Booleans====
 
This is a method to define an array of random booleans - this is used to create the message that Alice encrypts, and to define the key that Alice and Bob use to decrypt the message.
 
====AdversarialCrypto Class====
 
The Adversarial Crypto class defines the set of three neural networks used to do the adversarial network. As part of the training and evaluation process <code>train_and_evaluate()</code>, an instance of this class is created and passed to the evaluation function <code>doeval()</code> in the main body of the code.
 
What does this class do?
* Creates the three networks for Alice, Bob, and Eve
* Creates connections from Alice to Bob and Alice to Eve to pass the correct info to the correct networks
* Defines the loss function for Eve and for Bob
* Defines the optimizers that the networks should use
* Manages the state of each network (i.e., allows you to reset the Eve network)
 
<pre>
class AdversarialCrypto(object):
    """Primary model implementation class for Adversarial Neural Crypto.
    This class contains the code for the model itself,
    and when created, plumbs the pathways from Alice to Bob and
    Eve, creates the optimizers and loss functions, etc.
   
    Attributes:
        eve_loss:  Eve's loss function.
        bob_loss:  Bob's loss function.  Different units from eve_loss.
        eve_optimizer:  A tf op that runs Eve's optimizer.
        bob_optimizer:  A tf op that runs Bob's optimizer.
        bob_reconstruction_loss:  Bob's message reconstruction loss,
          which is comparable to eve_loss.
        reset_eve_vars:  Execute this op to completely reset Eve.
    """
</pre>
 
What does the constructor do?
* The constructor creates the Alice, Bob, and Eve model by calling the model() method with the right parameters
* Creates the optimizer for Bob and for Eve
* Sets up the loss function for Eve, based on <code>tf.reduce_sum()</code> and <code>optimizer.minimize()</code>
* Sets up the loss function for Bob, based on <code>tf.reduce_sum()</code>
 
<pre>
  def __init__(self):
    in_m, in_k = self.get_message_and_key()
    encrypted = self.model('alice', in_m, in_k)
    decrypted = self.model('bob', encrypted, in_k)
    eve_out = self.model('eve', encrypted, None)
 
    self.reset_eve_vars = tf.group(
        *[w.initializer for w in tf.get_collection('eve')])


    optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)
This trains a neural network model to detect the sentiment in IMDB text. This illustrates semi-supervised learning.


    # Eve's goal is to decrypt the entire message:
Link to code: https://github.com/tensorflow/models/tree/master/research/adversarial_text
    eve_bits_wrong = tf.reduce_sum(
        tf.abs((eve_out + 1.0) / 2.0 - (in_m + 1.0) / 2.0), [1])
    self.eve_loss = tf.reduce_sum(eve_bits_wrong)
    self.eve_optimizer = optimizer.minimize(
        self.eve_loss, var_list=tf.get_collection('eve'))


    # Alice and Bob want to be accurate...
==Running==
    self.bob_bits_wrong = tf.reduce_sum(
        tf.abs((decrypted + 1.0) / 2.0 - (in_m + 1.0) / 2.0), [1])
    # ... and to not let Eve do better than guessing.
    self.bob_reconstruction_loss = tf.reduce_sum(self.bob_bits_wrong)
    bob_eve_error_deviation = tf.abs(float(TEXT_SIZE) / 2.0 - eve_bits_wrong)
    # 7-9 bits wrong is OK too, so we squish the error function a bit.
    # Without doing this, we often tend to hang out at 0.25 / 7.5 error,
    # and it seems bad to have continued, high communication error.
    bob_eve_loss = tf.reduce_sum(
        tf.square(bob_eve_error_deviation) / (TEXT_SIZE / 2)**2)
 
    # Rescale the losses to [0, 1] per example and combine.
    self.bob_loss = (self.bob_reconstruction_loss / TEXT_SIZE + bob_eve_loss)
 
    self.bob_optimizer = optimizer.minimize(
        self.bob_loss,
        var_list=(tf.get_collection('alice') + tf.get_collection('bob')))
</pre>
 
====AdversarialCrypto Class - Creation of Neural Network Model====
 
Now, the actual creation of the models for Alice, Bob, and Eve happens in the call to <code>model()</code>. What happens with the method header?
* We pass in the name of the graph component ('alice', 'bob', or 'eve') to add new model components to
* We pass in the input message (either the plain text, to Alice, or the ciphertext, to Bob and Eve)
* We pass in the key (optional); if no key is passed in, the input to the neural network is just the message
 
Here's the model method definition:
 
<pre>
  def model(self, collection, message, key=None):
    """The model for Alice, Bob, and Eve.  If key=None, the first FC layer
    takes only the message as inputs.  Otherwise, it uses both the key
    and the message.
    Args:
      collection:  The graph keys collection to add new vars to.
      message:  The input message to process.
      key:  The input key (if any) to use.
    """
 
    if key is not None:
      combined_message = tf.concat(axis=1, values=[message, key])
    else:
      combined_message = message
 
</pre>
 
If we pass in both a message and a key, we concatenate the inputs using <code>tf.concat()</code>. Otherwise, the only input is the message.
 
The next step is to call <code>tf.contrib.framework.arg_scope()</code>. The [https://www.tensorflow.org/api_docs/python/tf/contrib/framework/arg_scope documentation] for this function will loop over each TensorFlow model graph passed to it, and add a <code>@add_arg_scope</code> decorator to it.
 
In other words, every time we have a fully_connected layer and a conv2d layer, we set them up to be on the specified graph (Alice, Bob, or Eve):
 
<pre>
    # Ensure that all variables created are in the specified collection.
    with tf.contrib.framework.arg_scope(
        [tf.contrib.layers.fully_connected, tf.contrib.layers.conv2d],
        variables_collections=[collection]):
</pre>


Next, we create a fully connected neural network layer. We pass in the message (and optionally the key), give the layer a size (the text length, and optionally the key length), we initialize the bias of the fully-connected layer as all-zero, and do not set an activation function:
Running this model is slightly more complicated than running the adversarial crypto network.


<pre>
The adversarial text network steps are as follows:
    fc = tf.contrib.layers.fully_connected(
* fetch data
          combined_message,
* generate vocab
          TEXT_SIZE + KEY_SIZE,
* generate training/validation/test data
          biases_initializer=tf.constant_initializer(0.0),
* pretrain language model
          activation_fn=None)
* train classifier
</pre>
* evaluate classifier on test data


Next, we assemble the layers of the neural network model.
===Get Vocabulary Data===


The model architecture we use is:
Start by obtaining the data, which is an 80 MB tar file, and decompress it:


<pre>
<pre>
(Fully Connected) -> (Conv2D) -> (Conv2D) -> (Conv2D) -> (Squeeze)
$ wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz -O /tmp/imdb.tar.gz
</pre>


This performs a sequence of 1D convolutions (expands the message out, and squeezes it back down).
$ tar -xf /tmp/imdb.tar.gz -C /tmp


<pre>
$ du -hs /tmp/aclImdb
    fc = tf.contrib.layers.fully_connected(
487M /tmp/aclImdb
          combined_message,
          TEXT_SIZE + KEY_SIZE,
          biases_initializer=tf.constant_initializer(0.0),
          activation_fn=None)
 
      # Perform a sequence of 1D convolutions (by expanding the message out to 2D
      # and then squeezing it back down).
      fc = tf.expand_dims(fc, 2)
      # 2,1 -> 1,2
      conv = tf.contrib.layers.conv2d(
          fc, 2, 2, 2, 'SAME', activation_fn=tf.nn.sigmoid)
      # 1,2 -> 1, 2
      conv = tf.contrib.layers.conv2d(
          conv, 2, 1, 1, 'SAME', activation_fn=tf.nn.sigmoid)
      # 1,2 -> 1, 1
      conv = tf.contrib.layers.conv2d(
          conv, 1, 1, 1, 'SAME', activation_fn=tf.nn.tanh)
      conv = tf.squeeze(conv, 2)
      return conv
</pre>
</pre>


=====AdversarialCrypto Class - Creation of Message and Key====
===Build the Vocabulary===
 
In the constructor, the input message and key are generated using a <code>get_message_and_key()</code> method, which in turn calls a <code>batch_of_random_bools()</code> method. This is not complicated, it just constructs a vector of booleans.


Here is the method in the AdversarialCrypto class:
Use a Bazel job to build the vocabulary from the data:


<pre>
<pre>
  def get_message_and_key(self):
$ IMDB_DATA_DIR=/tmp/imdb
    """Generate random pseudo-boolean key and message values."""
 
    batch_size = tf.placeholder_with_default(FLAGS.batch_size, shape=[])


     in_m = batch_of_random_bools(batch_size, TEXT_SIZE)
$ bazel run data:gen_vocab -- \
     in_k = batch_of_random_bools(batch_size, KEY_SIZE)
     --output_dir=$IMDB_DATA_DIR \
     return in_m, in_k
     --dataset=imdb \
     --imdb_input_dir=/tmp/aclImdb \
    --lowercase=False
</pre>
</pre>


and the batch_of_random_bools method that it calls:
This uses a build rule called <code>gen_vocab</code> located in <code>data/BUILD</code>:


<pre>
<pre>
def batch_of_random_bools(batch_size, n):
py_binary(
     """Return a batch of random "boolean" numbers.
     name = "gen_vocab",
     Args:
     srcs = ["gen_vocab.py"],
      batch_size:  Batch size dimension of returned tensor.
     deps = [
      n:  number of entries per batch.
        ":data_utils",
     Returns:
        ":document_generators",
      A [batch_size, n] tensor of "boolean" numbers, where each number is
        # tensorflow dep,
      preresented as -1 or 1.
     ],
    """
)
   
     as_int = tf.random_uniform(
        [batch_size, n], minval=0, maxval=2, dtype=tf.int32)
    expanded_range = (as_int * 2) - 1
    return tf.cast(expanded_range, tf.float32)
</pre>
</pre>


This creates a random uniform tensor of 1s and -1s. Here's a quick interactive iPython session to illustrate:
This build vocabulary step is, unfortunately, failing. See this Github issue (1917): https://github.com/tensorflow/models/issues/1917


<pre>
==Adversarial Image Network==
In [1]: import tensorflow as tf
 
In [2]: tf.InteractiveSession()
2017-10-26 00:24:11.694267: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 00:24:11.694303: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 00:24:11.694313: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 00:24:11.694321: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Out[2]: <tensorflow.python.client.session.InteractiveSession at 0x11462bba8>
 
In [3]: as_int = tf.random_uniform([10,2], minval=0, maxval=2, dtype=tf.int32)
 
In [4]: as_int.eval()
Out[4]:
array([[0, 0],
      [1, 1],
      [0, 0],
      [0, 1],
      [1, 0],
      [0, 0],
      [0, 1],
      [1, 1],
      [0, 0],
      [1, 0]], dtype=int32)
 
In [5]: expanded_range = (as_int*2)-1
 
In [6]: expanded_range.eval()
Out[6]:
array([[-1,  1],
      [ 1,  1],
      [-1,  1],
      [ 1, -1],
      [ 1,  1],
      [-1,  1],
      [ 1, -1],
      [-1, -1],
      [-1, -1],
      [ 1, -1]], dtype=int32)
</pre>
 
==Adversarial Text==


=Flags=
=Flags=

Latest revision as of 00:13, 27 October 2017

Adversarial Neural Networks

Adversarial neural networks use an architecture consisting of two separate neural networks - one network attempts to learn how to accomplish a task, and another network attempts to differentiate between the output of the first network and the "real" output.

TensorFlow Adversarial Examples

Adversarial Crypto

This adversarial crypto neural network attempts to learn how to protect communications using the adversarial architecture.

Paper: "Learning to Protect Communications with Adversarial Neural Cryptography"

Link to paper: https://arxiv.org/abs/1610.06918

Link to code: https://github.com/tensorflow/models/tree/master/research/adversarial_crypto

Part of the tensorflow models repository (https://github.com/tensorflow/models/tree/master/research).

Running

To train the network:

$ python train_eval.py

The approach used by the training is to train the "defender" network (representing the Alice-Bob channel) until it is sufficiently well-trained, then reset the "attacker" network (representing the eavesdropper Eve) from scratch to give the eavesdropper multiple opportunities to find weaknesses in the cryptosystem.

The Model

We'll step through the code line-by-line. Here's the link to the code: https://github.com/tensorflow/models/blob/master/research/adversarial_crypto/train_eval.py

Full model walkthrough is on the TensorFlow/Adversarial Crypto page.

The rundown is:

  • Create an AdversarialCrypto class that holds a training optimizer object for the Bob and Alice networks
  • Define a method that evaluates the networks as-is and prints the percent losses
  • Define a method that trains the network for a specified number of iterations, stopping early if the network reaches its target losses
  • Define a method that calls the training function (above), then re-trains Eve several more times from scratch

Adversarial Text

This trains a neural network model to detect the sentiment in IMDB text. This illustrates semi-supervised learning.

Link to code: https://github.com/tensorflow/models/tree/master/research/adversarial_text

Running

Running this model is slightly more complicated than running the adversarial crypto network.

The adversarial text network steps are as follows:

  • fetch data
  • generate vocab
  • generate training/validation/test data
  • pretrain language model
  • train classifier
  • evaluate classifier on test data

Get Vocabulary Data

Start by obtaining the data, which is an 80 MB tar file, and decompress it:

$ wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz -O /tmp/imdb.tar.gz

$ tar -xf /tmp/imdb.tar.gz -C /tmp

$ du -hs /tmp/aclImdb
487M	/tmp/aclImdb

Build the Vocabulary

Use a Bazel job to build the vocabulary from the data:

$ IMDB_DATA_DIR=/tmp/imdb

$ bazel run data:gen_vocab -- \
    --output_dir=$IMDB_DATA_DIR \
    --dataset=imdb \
    --imdb_input_dir=/tmp/aclImdb \
    --lowercase=False

This uses a build rule called gen_vocab located in data/BUILD:

py_binary(
    name = "gen_vocab",
    srcs = ["gen_vocab.py"],
    deps = [
        ":data_utils",
        ":document_generators",
        # tensorflow dep,
    ],
)

This build vocabulary step is, unfortunately, failing. See this Github issue (1917): https://github.com/tensorflow/models/issues/1917

Adversarial Image Network

Flags