Parsing TFRecords with the Tensorflow Dataset API

Update: Datasets are now part of the example in the Tensorflow library.

The Datasets API has become the new standard in feeding things into Tensorflow. Moreover, there seem to be plans to deprecate queues and other inputs, unifying the way data is fed into models. The idea now is to (1) create a Dataset object (in this case a TFRecordDataset) and then (2) create an Iterator that will extract elements and feed them into the model.

I’ve modified tensorflow’s example on “how to read data” to reflect that change. I’ve submitted a PR to the tensorflow repo, until it gets merged take a look at the new code below. It is a lot easier to read, see for yourself:

Further Reading:

Advertisements

Extract the Windows Product Key from a running Windows Machine

I always had the suspicion that Windows saves the used product key in some way. Today I learned that it does so in a very simple manner. Converted into Hex as a registry key called DigitalProductId. The catch is that it doesn’t use UTF-8, ASCII or another standard encoding, rather some “home brew”.

The registry location is:

HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\DigitalProductId

Searching the web, I came across this handy script (found here), which I copied into a gist (see below). It reads out the registry, converts the value and then displays the resulting product key in human-readable form. (Assuming product keys can be considered human-readable.)

pyØMQ bind / connect vs. pub / sub

In zmq one is told that it doesn’t matter which side if the communication “binds” to a socket and which side “connects”. Rather it should be the “stable” side that “binds”. However, for the publisher / subscriber (pub/sub) pattern it does matter. At least in pyzmq.

More precisely, the order in which the subscriber and publisher are initialized correlates with which side should bind or connect.

Let’s look the following 4 cases (click on case for code):

First: PUB
Second: SUB
First: SUB
Second: PUB
PUB: bind
SUB: connect
works (1) works (2)
PUB: connect
SUB: bind
works (3) doesn’t work (fix) (4)

Case 1

This case works. However, if the publisher starts sending messages while the subscriber is still connecting they are lost. This is known as the slow-joiner-symptom.

Case 2

This case simply works. It also is the “preferred” way of setting up a PUB / SUB with zmq.

Case 3

Now this case is a bit special, at least in pyzmq. One would expect the slow-joiner-symptom, similar to case 1. However, at least in pyzmq messages are queued on the publisher’s side instead of being thrown away, until a subscriber binds to the address.

Once the subscriber binds to an address, the publisher dumps all the messages it has queued up to the subscriber, even those sent before the connection was established.

Case 4

This case is strange in the very sense of the word. When the publisher connects, it happily starts sending messages as the address is bound. However, the subscriber doesn’t receive anything. Yep, it’s like the publisher doesn’t even exist.

However, if the subscriber polls at least once after the publisher has connected all subsequent messages will be delivered correctly. This is true, even if the publisher has not send anything yet. (see gist)

Conclusion

While any of the 4 scenarios work, one has to be aware of their specialties to avoid pitfalls.

If the subscriber binds, one has to keep an eye on the high water mark on the publisher (case 3) and be aware that messages may be ignored until the subscriber tries to receive for the first time (case 4).

If the publisher binds, one has to be aware of the slow-joiner-symptom (case 1).

A Private Docker Registry with SSL on an Offline Docker Swarm

A part of my master’s thesis is to set up a Docker Swarm to parallelize reinforcement learning experiments. For this I needed a registry hosted by the swarm. This is because the swarm is unfortunately offline and I somehow have to distribute images across nodes.

Many tutorials online show how to set up a registry with SSL certificates and authentication using nginx. However, I wanted something a little simpler. Further, I don’t have a domain name that I can set as common name (CN), so I have to use the IP address for the certificate. This has to be added as SAN (subject alternate name), something that the usual tutorials don’t describe.

Note: “Also as of the Effective Date, the CA SHALL NOT issue a certificate with an Expiry Date later than 1 November 2015 with a subjectAlternativeName extension or Subject commonName field containing a Reserved IP Address or Internal Server Name.” – Baseline Requirements for the Issuance and Management of Publicly-Trusted Certificates, v.1.0 Thus, it is necessary to use self-signed certificates in this scenario.

The process breaks down into 3 simple steps:

  1. Create a self-signed SSL certificate with IP SAN
  2. Setup the registry service using the certificate
  3. Give the Nodes in the Swarm Access to the Certificate

Setup

For this post my “swarm” will be a single manager node running in a VM.

The last command gives the IP that has to be named in the certificate. I first setup everything I need on the host machine, then deploy it to the swarm. This is fancy talk for “make a folder with all the good stuff and scp it’s content to the VM” — poor mans deploy and as we know, all students are poor.

Create a self-signed SSL certificate with IP SAN

The IP for the node running my registry is: 192.168.99.102. Your one may be different. I wrote a custom openssl.cnf based off the example at /etc/ssl/openssl.cnf and placed it into ~/my_docker_registry_deploy_folder/

Remember that the IP in the last line may differ in your case. With that I could generate a private key and the certificate:

When prompted to enter some information I left everything blank.  Usually the CN has to be equal to the domain name, but in this case the SAN takes care of this. Quickly verify that the SAN is specified:

openssl x509 -in certs/certificate.crt -text -noout

The important line is:

X509v3 Subject Alternative Name:
IP Address:192.168.99.102

That’s all for the certificates.

Setup the registry service using the certificate

To make the deployment easy, I wrote a small docker-compose file that starts the registry service:

This launches the registry as a service on port 433. A potential flaw is that this container isn’t constrained to a specific machine. This is because volumes are not shared between nodes and thus all stored images would be lost if the container migrates. In the toy example there is only 1 machine; However, with an actual swarm (that’s the point of this exercise, right?) one would introduce such placement constraints or use a storage solution that migrates with the container.

Time for deployment:

The certificate will be wiped on reboot if we leave it in the home directory. Thus, I place it in a persistent location. Which is also the location the container reads it from.

Give the Nodes in the Swarm Access to the Certificate

The registry is ready and set up, however, when pushing or pulling docker will issue a self-signed certificate error, because it can not verify the certificate. To fix this, each client that wants to interact with the server needs a copy of the certificate. I installed the certificate on each client into

/etc/docker/certs.d/192.168.99.102/certificate.crt

Then, I restarted the docker on the client, to reload certificates:

sudo service docker restart

Thats it! Now I can push to this registry just like any other registry.

docker tag registry:2 192.168.99.102/registry:2
docker push 192.168.99.102/registry:2
docker pull 192.168.99.102/registry:2

Thanks for reading and happy coding!