From charlesreid1

Adversarial Neural Networks

Adversarial neural networks use an architecture consisting of two separate neural networks - one network attempts to learn how to accomplish a task, and another network attempts to differentiate between the output of the first network and the "real" output.

TensorFlow Examples of Adversarial Neural Networks

Adversarial Crypto

This adversarial crypto neural network attempts to learn how to protect communications using the adversarial architecture.

Link to paper: "Learning to Protect Communications with Adversarial Neural Cryptography": https://arxiv.org/abs/1610.06918

Link to code: https://github.com/tensorflow/models/tree/master/research/adversarial_crypto

Part of the tensorflow models repository (https://github.com/tensorflow/models/tree/master/research).

Running

To train the network:

$ python train_eval.py

The approach used by the training is to train the "defender" network (representing the Alice-Bob channel) until it is sufficiently well-trained, then reset the "attacker" network (representing the eavesdropper Eve) from scratch to give the eavesdropper multiple opportunities to find weaknesses in the cryptosystem.

The Model

We'll step through the code line-by-line again. Here's the link to the code: https://github.com/tensorflow/models/blob/master/research/adversarial_crypto/train_eval.py

License

Obligatory license info:

# Copyright 2016 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

Some info about the network:

  • There are actually 3 neural networks involved: Alice, Bob, and Eve
  • Alice takes inputs in_m (message), in_k (key) and outputs the ciphertext
  • Bob takes inputs in_k (key), ciphertext and attempts to output the plaintext
  • Even takes input ciphertext (no key) and also attempts to output the plaintext

The file starts with imports/declarations to be compatible with Python 3:

# TensorFlow Python 3 compatibility
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import signal
import sys
from six.moves import xrange  # pylint: disable=redefined-builtin
import tensorflow as tf

Input Argument Flags and Parameters

Hyperparameter flags can be set on the command line:

flags = tf.app.flags
flags.DEFINE_float('learning_rate', 0.0008, 'Constant learning rate')
flags.DEFINE_integer('batch_size', 4096, 'Batch size')
FLAGS = flags.FLAGS

The FLAGS stuff does not seem to be defined anywhere in the documentation, so the usage is not clear here. But, as an author on TF project states here, it is intended to make demos more convenient, and essentially wraps argparse.

Also see TensorFlow/Command Line Args.

More parameter definitions follow:

# Input and output configuration.
TEXT_SIZE = 16
KEY_SIZE = 16

# Training parameters.
ITERS_PER_ACTOR = 1
EVE_MULTIPLIER = 2  # Train Eve 2x for every step of Alice/Bob
# Train until either max loops or Alice/Bob "good enough":
MAX_TRAINING_LOOPS = 850000
BOB_LOSS_THRESH = 0.02  # Exit when Bob loss < 0.02 and Eve > 7.7 bits
EVE_LOSS_THRESH = 7.7

# Logging and evaluation.
PRINT_EVERY = 200  # In training, log every 200 steps.
EVE_EXTRA_ROUNDS = 2000  # At end, train eve a bit more.
RETRAIN_EVE_ITERS = 10000  # Retrain eve up to ITERS*LOOPS times.
RETRAIN_EVE_LOOPS = 25  # With an evaluation each loop
NUMBER_OF_EVE_RESETS = 5  # And do this up to 5 times with a fresh eve.
# Use EVAL_BATCHES samples each time we check accuracy.
EVAL_BATCHES = 1


Batch of Random Booleans

This is a method to define an array of random booleans - this is used to create the message that Alice encrypts, and to define the key that Alice and Bob use to decrypt the message.

AdversarialCrypto Class

The Adversarial Crypto class defines the set of three neural networks used to do the adversarial network. As part of the training and evaluation process train_and_evaluate(), an instance of this class is created and passed to the evaluation function doeval() in the main body of the code.

What does this class do?

  • Creates the three networks for Alice, Bob, and Eve
  • Creates connections from Alice to Bob and Alice to Eve to pass the correct info to the correct networks
  • Defines the loss function for Eve and for Bob
  • Defines the optimizers that the networks should use
  • Manages the state of each network (i.e., allows you to reset the Eve network)
class AdversarialCrypto(object):
    """Primary model implementation class for Adversarial Neural Crypto.
    This class contains the code for the model itself,
    and when created, plumbs the pathways from Alice to Bob and
    Eve, creates the optimizers and loss functions, etc.
    
    Attributes:
        eve_loss:  Eve's loss function.
        bob_loss:  Bob's loss function.  Different units from eve_loss.
        eve_optimizer:  A tf op that runs Eve's optimizer.
        bob_optimizer:  A tf op that runs Bob's optimizer.
        bob_reconstruction_loss:  Bob's message reconstruction loss,
          which is comparable to eve_loss.
        reset_eve_vars:  Execute this op to completely reset Eve.
    """

What does the constructor do?

  • The constructor creates the Alice, Bob, and Eve model by calling the model() method with the right parameters
  • Creates the optimizer for Bob and for Eve
  • Sets up the loss function for Eve, based on tf.reduce_sum() and optimizer.minimize()
  • Sets up the loss function for Bob, based on tf.reduce_sum()
  def __init__(self):
    in_m, in_k = self.get_message_and_key()
    encrypted = self.model('alice', in_m, in_k)
    decrypted = self.model('bob', encrypted, in_k)
    eve_out = self.model('eve', encrypted, None)

    self.reset_eve_vars = tf.group(
        *[w.initializer for w in tf.get_collection('eve')])

    optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)

    # Eve's goal is to decrypt the entire message:
    eve_bits_wrong = tf.reduce_sum(
        tf.abs((eve_out + 1.0) / 2.0 - (in_m + 1.0) / 2.0), [1])
    self.eve_loss = tf.reduce_sum(eve_bits_wrong)
    self.eve_optimizer = optimizer.minimize(
        self.eve_loss, var_list=tf.get_collection('eve'))

    # Alice and Bob want to be accurate...
    self.bob_bits_wrong = tf.reduce_sum(
        tf.abs((decrypted + 1.0) / 2.0 - (in_m + 1.0) / 2.0), [1])
    # ... and to not let Eve do better than guessing.
    self.bob_reconstruction_loss = tf.reduce_sum(self.bob_bits_wrong)
    bob_eve_error_deviation = tf.abs(float(TEXT_SIZE) / 2.0 - eve_bits_wrong)
    # 7-9 bits wrong is OK too, so we squish the error function a bit.
    # Without doing this, we often tend to hang out at 0.25 / 7.5 error,
    # and it seems bad to have continued, high communication error.
    bob_eve_loss = tf.reduce_sum(
        tf.square(bob_eve_error_deviation) / (TEXT_SIZE / 2)**2)

    # Rescale the losses to [0, 1] per example and combine.
    self.bob_loss = (self.bob_reconstruction_loss / TEXT_SIZE + bob_eve_loss)

    self.bob_optimizer = optimizer.minimize(
        self.bob_loss,
        var_list=(tf.get_collection('alice') + tf.get_collection('bob')))

AdversarialCrypto Class - Creation of Neural Network Model

Now, the actual creation of the models for Alice, Bob, and Eve happens in the call to model(). What happens with the method header?

  • We pass in the name of the graph component ('alice', 'bob', or 'eve') to add new model components to
  • We pass in the input message (either the plain text, to Alice, or the ciphertext, to Bob and Eve)
  • We pass in the key (optional); if no key is passed in, the input to the neural network is just the message

Here's the model method definition:

  def model(self, collection, message, key=None):
    """The model for Alice, Bob, and Eve.  If key=None, the first FC layer
    takes only the message as inputs.  Otherwise, it uses both the key
    and the message.
    Args:
      collection:  The graph keys collection to add new vars to.
      message:  The input message to process.
      key:  The input key (if any) to use.
    """

    if key is not None:
      combined_message = tf.concat(axis=1, values=[message, key])
    else:
      combined_message = message

If we pass in both a message and a key, we concatenate the inputs using tf.concat(). Otherwise, the only input is the message.

The next step is to call tf.contrib.framework.arg_scope(). The documentation for this function will loop over each TensorFlow model graph passed to it, and add a @add_arg_scope decorator to it.

In other words, every time we have a fully_connected layer and a conv2d layer, we set them up to be on the specified graph (Alice, Bob, or Eve):

    # Ensure that all variables created are in the specified collection.
    with tf.contrib.framework.arg_scope(
        [tf.contrib.layers.fully_connected, tf.contrib.layers.conv2d],
        variables_collections=[collection]):

Next, we create a fully connected neural network layer. We pass in the message (and optionally the key), give the layer a size (the text length, and optionally the key length), we initialize the bias of the fully-connected layer as all-zero, and do not set an activation function:

    fc = tf.contrib.layers.fully_connected(
          combined_message,
          TEXT_SIZE + KEY_SIZE,
          biases_initializer=tf.constant_initializer(0.0),
          activation_fn=None)

Next, we assemble the layers of the neural network model.

The model architecture we use is:

(Fully Connected) -> (Conv2D) -> (Conv2D) -> (Conv2D) -> (Squeeze)

This performs a sequence of 1D convolutions (expands the message out, and squeezes it back down).

     fc = tf.contrib.layers.fully_connected(
          combined_message,
          TEXT_SIZE + KEY_SIZE,
          biases_initializer=tf.constant_initializer(0.0),
          activation_fn=None)

      # Perform a sequence of 1D convolutions (by expanding the message out to 2D
      # and then squeezing it back down).
      fc = tf.expand_dims(fc, 2)
      # 2,1 -> 1,2
      conv = tf.contrib.layers.conv2d(
          fc, 2, 2, 2, 'SAME', activation_fn=tf.nn.sigmoid)
      # 1,2 -> 1, 2
      conv = tf.contrib.layers.conv2d(
          conv, 2, 1, 1, 'SAME', activation_fn=tf.nn.sigmoid)
      # 1,2 -> 1, 1
      conv = tf.contrib.layers.conv2d(
          conv, 1, 1, 1, 'SAME', activation_fn=tf.nn.tanh)
      conv = tf.squeeze(conv, 2)
      return conv

AdversarialCrypto Class - Creation of Message and Key

In the constructor, the input message and key are generated using a get_message_and_key() method, which in turn calls a batch_of_random_bools() method. This is not complicated, it just constructs a vector of booleans.

Here is the method in the AdversarialCrypto class:

  def get_message_and_key(self):
    """Generate random pseudo-boolean key and message values."""

    batch_size = tf.placeholder_with_default(FLAGS.batch_size, shape=[])

    in_m = batch_of_random_bools(batch_size, TEXT_SIZE)
    in_k = batch_of_random_bools(batch_size, KEY_SIZE)
    return in_m, in_k

and the batch_of_random_bools method that it calls:

def batch_of_random_bools(batch_size, n):
    """Return a batch of random "boolean" numbers.
    Args:
      batch_size:  Batch size dimension of returned tensor.
      n:  number of entries per batch.
    Returns:
      A [batch_size, n] tensor of "boolean" numbers, where each number is
      preresented as -1 or 1.
    """
    
    as_int = tf.random_uniform(
        [batch_size, n], minval=0, maxval=2, dtype=tf.int32)
    expanded_range = (as_int * 2) - 1
    return tf.cast(expanded_range, tf.float32)

This creates a random uniform tensor of 1s and -1s. Here's a quick interactive iPython session to illustrate:

In [1]: import tensorflow as tf

In [2]: tf.InteractiveSession()
2017-10-26 00:24:11.694267: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 00:24:11.694303: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 00:24:11.694313: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-26 00:24:11.694321: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Out[2]: <tensorflow.python.client.session.InteractiveSession at 0x11462bba8>

In [3]: as_int = tf.random_uniform([10,2], minval=0, maxval=2, dtype=tf.int32)

In [4]: as_int.eval()
Out[4]:
array([[0, 0],
       [1, 1],
       [0, 0],
       [0, 1],
       [1, 0],
       [0, 0],
       [0, 1],
       [1, 1],
       [0, 0],
       [1, 0]], dtype=int32)

In [5]: expanded_range = (as_int*2)-1

In [6]: expanded_range.eval()
Out[6]:
array([[-1,  1],
       [ 1,  1],
       [-1,  1],
       [ 1, -1],
       [ 1,  1],
       [-1,  1],
       [ 1, -1],
       [-1, -1],
       [-1, -1],
       [ 1, -1]], dtype=int32)


Do Evaluation Method

We now come to the definition of the function that actually does the evaluation, doeval().

The method header takes a few arguments:

  • The TensorFlow session
  • The AdversarialCrypto class instance
  • The number of iterations that should be run
  • The iteration count to write to the log
def doeval(s, ac, n, itercount):
    """
    Evaluate the current network on n batches of random examples.
    Args:
        s:  The current TensorFlow session
        ac: an instance of the AdversarialCrypto class
        n:  The number of iterations to run.
        itercount: Iteration count label for logging.
    Returns:
        Bob and Eve's loss, as a percent of bits incorrect.
    """

The main role of the doeval function is to run the neural network, and compute the losses that result. The TensorFlow session variable s will contain all three neural networks on its graph, so we can just call s.run() without needing to specify all three graphs.

Note that this is called in batches, meaning the loss_accum variables are only computed within a batch and are reset each batch.

For each iteration for _ in xrange(n) we run the model, and get the loss functions from Bob and Eve. Recall from above that bob_reconstruction_loss and eve_loss are both the results of calls to tf.reduce_sum(), which are computed on variables inside the model. The reduce_sum function simply sums tensor components along a particular axis, thus reducing the dimensionality of the tensor. In this case we are summing the incorrect bits along the "axis" of the message.

At each iteration, we are accumulating to total losses, that is, the sum of the bits that were incorrect. We accumulate these for each iteration.

    bob_loss_accum = 0
    eve_loss_accum = 0
    for _ in xrange(n):
        bl, el = s.run([ac.bob_reconstruction_loss, ac.eve_loss])
        bob_loss_accum += bl
        eve_loss_accum += el

In the end, we compute the loss percentage by multiplying

    bob_loss_percent = bob_loss_accum / (n * FLAGS.batch_size)
    eve_loss_percent = eve_loss_accum / (n * FLAGS.batch_size)
    print('%d %.2f %.2f' % (itercount, bob_loss_percent, eve_loss_percent))
    sys.stdout.flush()
    return bob_loss_percent, eve_loss_percent

Train and Evaluate Method

Adversarial Text

Flags