Download Folders from a GitHub Repo using Python (… files, too)

At some point during your life as a programmer, you will end up with the problem of downloading a single folder from a GitHub repository. For me, it was because I had an auxiliary repo containing files relevant to some unit tests. To solve this problem, I found a very elegant (and super easy) solution, which I want to share here today. In other words, this is a short how-to on how to download/copy files and folders from GitHub using python.

The solution uses the awesome fsspec library. fsspec is a pythonic approach to filesystem management, and allows you to use python to access data in various kinds of locations: on your local machine, on all major cloud providers, and – most importantly – on GitHub. There are many more locations, so the library is worth checking out if you have the time.

Installation

To get started, install fsspec. (Chances are you already have it, because increasing parts of the pydata ecosystem use it internally.)

pip install fsspec

Copy a Folders

import fsspec
from pathlib import Path
# flat copy
destination = Path.home() / "test_folder_copy"
destination.mkdir(exist_ok=True, parents=True)
fs = fsspec.filesystem("github", org="githubtraining", repo="hellogitworld")
fs.get(fs.ls("src/"), destination.as_posix())
# recursive copy
destination = Path.home() / "test_recursive_folder_copy"
destination.mkdir(exist_ok=True, parents=True)
fs = fsspec.filesystem("github", org="githubtraining", repo="hellogitworld")
fs.get(fs.ls("src/"), destination.as_posix(), recursive=True)
view raw folder.py hosted with ❤ by GitHub
Example of how to download a folder from GitHub (shallow or recursive).

The above snippet does the following: We first declare a destination (where to store the folder’s content). Then we use fsspec to turn the repo into a pythonic filesystem. Finally, we list all the files in the target folder of the repo (fs.ls(…)) and download them all using fs.get. Simple, elegant, and convenient. I love it!

Copy Files

Copying/Downloading individual files works the same way; however, this time the destination has to be a file.

import fsspec
from pathlib import Path
# download a single file
destination = Path.home() / "downloaded_readme.txt"
fs = fsspec.filesystem("github", org="githubtraining", repo="hellogitworld")
fs.get("README.txt", destination.as_posix())
view raw copy_file.py hosted with ❤ by GitHub
Example of how to download a single file from GitHub.

And that is all there is to it. Happy coding!

Advertisement

Webcam Capture in Python (without OpenCV)

Did you know that there is a python library that allows you to to capture both a webcam stream or a single webcam image? Did you know that this works on every OS? This is what I want to share in this post: A tutorial on how to use imageio to access your webcam on Linux, Windows, or MacOS that works in either a Python script or a Jupyter Notebook. No OpenCV needed 🙂

Before we begin, a caveat for Jupyter: While the notebook is displayed on your current machine, and widgets run locally, your kernel (that runs the python code) may be hosted on a remote server, docker container, virtual machine, … depending on your setup. If this is the case for you, please note that IPython can only access the remote server’s webcam, not your local one.

Installation

pip install imageio[ffmpeg]

Get a single Image/Screenshot

import imageio as iio
import matplotlib.pyplot as plt
camera = iio.get_reader("<video0>")
screenshot = camera.get_data(0)
camera.close()
plt.imshow(screenshot)

Get a Video as a Sequence of Frames

import imageio as iio
import matplotlib.pyplot as plt
import time
camera = iio.get_reader("<video0>")
meta = camera.get_meta_data()
delay = 1/meta["fps"]
for frame_counter in range(15):
frame = camera.get_next_data()
time.sleep(delay)
camera.close()
plt.imshow(frame)

Full example (record a short MP4)

import imageio as iio
import matplotlib.pyplot as plt
import time
camera = iio.get_reader("<video0>")
meta = camera.get_meta_data()
num_frames = 5 * int(meta["fps"])
delay = 1/meta["fps"]
buffer = list()
for frame_counter in range(num_frames):
frame = camera.get_next_data()
buffer.append(frame)
time.sleep(delay)
camera.close()
iio.mimwrite("frames.mp4", buffer, macro_block_size=8, fps=meta["fps"])

That’s it.

If you happen to run into problems, you have three options to ask for help:

  1. Ask a question on StackOverflow and tag it with imageio (I monitor the tag)
  2. Comment on this post
  3. Create a New Issue / Bug Report on GitHub.

Thanks for reading and Happy Coding!

SSO for your App via Auth0 + Nginx + Docker + Vouch-Proxy

This post is a tutorial on “How to setup SSO via Auth0 using Nginx and Vouch-Proxy”. I couldn’t find an existing nifty blog post on this; so I ended up having to figure it out. Here, I want to document the steps so that others (also future me) may have an easier time setting this up.

If you don’t want a full tutorial, and just look for an example, here is a link to the repository with the final config files: https://github.com/FirefoxMetzger/sso_example

The setup I am presenting here works on localhost, and is mainly aimed at local development. It is a Docker-based setup, so there are tons of existing tutorials for deployment. Another thing that you may want to look into is hardening (making things super secure). I left this part out (for the most part) to avoid distraction; I really just want to focus on getting SSO up and running.

Setup Auth0

The first step is to set up Auth0 and create a new tenant. Make sure to pick a region that is close to your physical location; this will affect the login speed, but also how the data you send to Auth0 will be handled (data laws).

Setup window for a new tenant (Oct 2020)

Currently (2020), this will create a default app and enable authentication via email/password and google as a social login provider. We will use this default app. You can of course customize, but I recommend you first set it up following this tutorial, and then add your customization afterward.

Next, we will navigate to the settings of the default app.

Navigate to the settings page.

There are a few useful items in the settings which we will need, but the first thing is to allow users of our app to log in and log out. For this, we need to tell Auth0 which URLs are okay to use as callbacks for both login (Allowed Callback URLs) and logout (Allowed Logout URLs).

Navigate to Application URIs. For Allowed Callback URLs add http://localhost/sso/auth, https://localhost/sso/auth, http://localhost:9090/auth . For allowed logout URLs add http://localhost, https://localhost, http://localhost/sso/validate, http://localhost:9090/validate .

Configuration of Login and Logout inside Auth0.

Make sure to hit save changes at the bottom of the page.

We will delete most of these URLs as we move along, and they mainly exist for testing (so that we can assemble this incrementally). The two HTTPS URLs are the final ones, that we will use when we are done. The URLs on port 9090 are for testing vouch-proxy, which by default runs on port 9090, and the remaining HTTP URLs are for testing nginx as a reverse proxy for vouch-proxy and your app.

While we are now done with the setup for Auth0, don’t leave the settings page yet. At the top of the page you can find the applications domain, client ID and the client secret. We will need this info in the next steps, so keep it around.

Client ID and Client Secret location.

Setup Vouch-Proxy

Vouch-Proxy can almost run out of the box and all we need to do is add a config file. It follows the example for a generic OIDC provider, which you can find on the vouch-proxy repo. I made some modifications to make it work with Auth0.

When you use this template, be sure to replace the Auth0 domain with your domain, replace the client ID with your client ID, and replace the client secret with your client secret.

# vouch config
# bare minimum to get vouch running with OpenID Connect (such as okta)
vouch:
logLevel: debug
testing: true
listen: 0.0.0.0
port: 9090
allowAllUsers: true
jwt:
secret: Your-64-character-secret-key-here
issuer: Vouch
compress: false
cookie:
name: my-vouch-ct
secure: false
domain: localhost
headers:
jwt: X-Vouch-Token
querystring: access_token
redirect: X-Vouch-Requested-URI
accesstoken: X-Vouch-IdP-AccessToken
idtoken: X-Vouch-IdP-IdToken
post_logout_redirect_uris:
https://your-project-name-here.eu.auth0.com/v2/logout?client_id={client_id_from_auth0}&returnTo=http://localhost/
oauth:
# Generic OpenID Connect
# including okta
provider: oidc
client_id: {client_id_from_auth0}
client_secret: {client_secret_from_auth0}
auth_url: https://your-project-name-here.eu.auth0.com/authorize
token_url: https://your-project-name-here.eu.auth0.com/oauth/token
user_info_url: https://your-project-name-here.eu.auth0.com/userinfo
scopes:
openid
email
profile
callback_url: http://localhost:9090/auth
view raw config.yml hosted with ❤ by GitHub
Example config.yml for vouch proxy

To store this file, in your project’s folder create a sub-folder named vouch and in it another one named config. The relative path (from the project root) to the file is ./vouch/config/config.yml. You can check the GitHub repo for reference.

Next, it is time to test if vouch-proxy can correctly communicate with Auth0. I promised a dockerized setup, so let’s create a docker-compose file. (We will expand this file later.)

version: '3'
services:
vouch:
image: voucher/vouch-proxy
volumes:
./vouch/config/config.yml:/config/config.yml
ports:
9090:9090
Initial Docker-Compose config file.

Save the file in the project’s root directory as docker-compose.yml. Then, navigate to the root directory and bring up the “stack” with a

docker-compose up

Once it is up and running, you can open your browser and navigate to localhost:9090. You should be greeted with a

404 page not found

This is expected, so don’t worry; it’s merely a test if the server is alive (page not found >> Site can’t be reached).

Next, we will test the login flow between vouch-proxy and Auth0. For this navigate to

http://localhost:9090/login?url=http://localhost:9090/validate

The url= parameter specifies the location that we want the user to return to after the login has completed. In this case, we navigate to /validate, which is the endpoint we will use throughout the app to validate the client’s access token.

Once you put that into your browser, you will be greeted by a simple HTML page telling you that you are being redirected to some address. This is vouch-proxy’s debug mode which lets you check if your flow works correctly. Click the long that forwards to Auth0.

Vouch Proxy testing page.

This should present you with Auth0’s login form. Here we want to create a new user with username and password and authenticate ourselves.

Signup Page at Auth0.

Once you have an account and have accepted the permissions, you will be redirected to vouch-proxy and it will confirm that you are logged in with your chosen email.

Vouch proxy indicated that the user is authorized.

Now the only thing to test is to logout the user. Here there are multiple options. (1) You can log out the user from your app (vouch-proxy), (2) log the user out of your app and Auth0, or (3) you can log the user out of your app, Auth0, and their social login provider (if they use a social login). We will not cover the third one here.

To use the first option navigate to

localhost:9090/logout

which will tell you that you have been logged out. At this point, you are still logged in at Auth0, so after logging back into the app ( http://localhost:9090/login?url=http://localhost:9090/validate ), you will not be asked to log in at Auth0.

To log yourself out of Auth0 in parallel with your app, you have to tell vouch-proxy to redirect the user to the logout URL of Auth0. You can read more about it here. To logout in both places, use the URL below.

http://localhost:9090/logout?url=https://dev-simple.eu.auth0.com/v2/logout?client_id={client_id_from_auth0}%26returnTo=http://localhost:9090/validate

Be sure to replace {client_id_from_auth0} with your client ID. Also, notice the percent encoding of the ampersand (%26), which, if left out, will break the logout procedure. If you log out with this link, and try to log in again, you will be asked to provide your username and password at Auth0 again.

Behind the scenes there are two places where this callback needs to be authorized (otherwise it won’t happen). First, vouch-proxy needs to find the url= parameter inside the list of post_logout_redirect_uris (check the config.yml). Second, the returnTo= parameter of the redirect needs to be added to Allowed Logout URLs in Auth0’s config. We have done this in the previous section. If something breaks for you, make sure to check these locations(and the returned X-Vouch-Error header).

Vouch-Proxy is working and communicating with Auth0! Next we will setup nginx as a reverse proxy sitting in front of vouch-proxy and our app.

Setup Nginx

The next step in the process is to setup an Nginx server that can act as a reverse proxy for our app and vouch-proxy. For this, create a new config file at ./nginx/conf.d/server.conf filled with the configuration below

server {
listen 80;
server_name localhost;
location ^~ /sso/ {
location /sso/validate {
proxy_pass http://vouch:9090/validate;
proxy_set_header Host $http_host;
proxy_pass_request_body off;
}
location = /sso/logout {
proxy_pass http://vouch:9090/logout?url=https://your-project-name-here.eu.auth0.com/v2/logout?client_id={client_id_from_auth0}%26returnTo=http://localhost/;
proxy_set_header Host $http_host;
}
proxy_set_header Host $http_host;
proxy_pass http://vouch:9090/;
}
# uncomment this to forward static content of vouch-proxy
# used when running vouch-proxy with `testing: true`
location /static/ {
proxy_set_header Host $http_host;
proxy_pass http://vouch:9090/static/;
}
location / {
root /usr/share/nginx/html;
index index.html;
}
}
view raw server.conf hosted with ❤ by GitHub
Initial nginx configuration.

Also, upate the docker-compose.yml like so

version: '3'
services:
vouch:
image: voucher/vouch-proxy
volumes:
./vouch/config/config.yml:/config/config.yml
nginx:
image: nginx
depends_on:
vouch
volumes:
./nginx/conf.d/:/etc/nginx/conf.d/
ports:
80:80
updated docker-compose.yml

and finally, update the vouch-proxy configuration to callback to the new location. For this, you only have to change the variable callback_url in the last line of the config file.

# vouch config
# bare minimum to get vouch running with OpenID Connect (such as okta)
vouch:
logLevel: debug
testing: true
listen: 0.0.0.0
port: 9090
allowAllUsers: true
jwt:
secret: Your-64-character-secret-key-here
issuer: Vouch
compress: false
cookie:
name: my-vouch-ct
secure: false
domain: localhost
headers:
jwt: X-Vouch-Token
querystring: access_token
redirect: X-Vouch-Requested-URI
accesstoken: X-Vouch-IdP-AccessToken
idtoken: X-Vouch-IdP-IdToken
post_logout_redirect_uris:
https://your-project-name-here.eu.auth0.com/v2/logout?client_id={client_id_from_auth0}&returnTo=http://localhost/
oauth:
# Generic OpenID Connect
# including okta
provider: oidc
client_id: {client_id_from_auth0}
client_secret: {client_secret_from_auth0}
auth_url: https://your-project-name-here.eu.auth0.com/authorize
token_url: https://your-project-name-here.eu.auth0.com/oauth/token
user_info_url: https://your-project-name-here.eu.auth0.com/userinfo
scopes:
openid
email
profile
callback_url: http://localhost/sso/auth
view raw config.yml hosted with ❤ by GitHub
updated vouch-proxy config.yml

Now update the docker stack so that it uses the new configuration.

What has just happened? We have added another node (nginx) to our docker stack and added a configuration for a server at the default http port. the first location block (^~ /sso/) acts as a reverse proxy for vouch-proxy. It has specializations for the /validate endpoint (no body needed), and for the /logout endpoint (for convenience). All authorization calls will, hence, go to localhost/sso/.

The second location block (/static/) handles requests to vouch-proxies static files (the logo, and .css you see for 302 calls). This block is only needed when we set testing: true in the vouch.proxy config. Otherwise, the debugging website will not be shown, and we can remove this block.

The third location block is where our app will live. For now, it is the default nginx website.

Let’s test this setup. First navigate to

localhost/

and make sure that nginx is up and running. Then navigate to

localhost/sso/

and make sure that nginx is correctly forwarding to vouch-proxy (you should see the familiar 404 page not found). Now, test the login by navigating to

localhost/sso/login?url=http://localhost/sso/validate

This should result in the same flow that you are familiar with from the previous section, except that the URL now contains localhost/sso/ instead of localhost:9090/.

To log out simply visit

localhost/sso/logout

Notice how you are also logged out of Auth0. Nginx adds the necessary parameters before passing it to vouch proxy.

Secure your App

So far, we have setup nginx, vouch-proxy, and Auth0 in a neat docker stack and we have verified that everything is working. What we haven’t done yet is to integrate the actual app.

First, let’s create a super basic app that nginx can serve. Create a new file at ./web/index.html and fill it with a simple button to view protected content

<!DOCTYPE html>
<html>
<head>
<title>Simple App</title>
</head>
<body>
<button onclick="location.href='/sso/login?url=http\:\/\/localhost/protected';" id="myButton" class="float-left submit-button" >View Protected Content</button>
</body>
</html>
view raw index.html hosted with ❤ by GitHub
homepage of the app

and also a page that requires login to view at ./web/protected/index.html

<!DOCTYPE html>
<html>
<head>
<title>Simple App</title>
</head>
<body>
<p>This content is protected, and can't be seen without being logged in.</p>
<button onclick="location.href='/sso/logout';" id="myButton" class="float-left submit-button" >Logout</button>
</body>
</html>
view raw index.html hosted with ❤ by GitHub
protected page in the app

Then, add the files to the nginx container by updating the docker-compose.yml

version: '3'
services:
vouch:
image: voucher/vouch-proxy
volumes:
./vouch/config/config.yml:/config/config.yml
nginx:
image: nginx
depends_on:
vouch
volumes:
./nginx/conf.d/:/etc/nginx/conf.d/
./web/:/usr/share/nginx/html/
ports:
80:80
updated docker-compose.yml

Restart/update the stack, and you can see your website at localhost/. When clicking the “view protected content”, you will see the page that should be protected. When you click “logout” on the protected page, you will trigger the logout flow familiar from the previous sections.

To actually protect the content, we need to add a new location to nginx and protect it. This is done easily by updating the config file.

server {
listen 80;
server_name localhost;
location ^~ /sso/ {
location /sso/validate {
proxy_pass http://vouch:9090/validate;
proxy_set_header Host $http_host;
proxy_pass_request_body off;
}
location = /sso/logout {
proxy_pass http://vouch:9090/logout?url=https://your-project-name-here.eu.auth0.com/v2/logout?client_id={client_id_from_auth0}%26returnTo=http://localhost/;
proxy_set_header Host $http_host;
}
proxy_set_header Host $http_host;
proxy_pass http://vouch:9090/;
}
# uncomment this to forward static content of vouch-proxy
# used when running vouch-proxy with `testing: true`
location /static/ {
proxy_set_header Host $http_host;
proxy_pass http://vouch:9090/static/;
}
location / {
root /usr/share/nginx/html;
index index.html;
}
location /protected {
auth_request /sso/validate;
root /usr/share/nginx/html;
index index.html;
expires 0;
add_header Cache-Control "no-cache, no-store, must-revalidate, max-age=0";
add_header Pragma "no-cache";
}
}
view raw server.conf hosted with ❤ by GitHub
updated nginx server config

Now, when you click View Protected Content or manually navigate to localhost/protected, you will see the protected page (if you are logged in) or (if not) you will get a 401 Unauthorized error.

Next, we can have nginx catch the error and, instead of raising it, redirect the user to the login procedure with a simple addition to the server.conf

server {
listen 80;
server_name localhost;
location ^~ /sso/ {
location /sso/validate {
proxy_pass http://vouch:9090/validate;
proxy_set_header Host $http_host;
proxy_pass_request_body off;
}
location = /sso/logout {
proxy_pass http://vouch:9090/logout?url=https://your-project-name-here.eu.auth0.com/v2/logout?client_id={client_id_from_auth0}%26returnTo=http://localhost/;
proxy_set_header Host $http_host;
}
proxy_set_header Host $http_host;
proxy_pass http://vouch:9090/;
}
# uncomment this to forward static content of vouch-proxy
# used when running vouch-proxy with `testing: true`
location /static/ {
proxy_set_header Host $http_host;
proxy_pass http://vouch:9090/static/;
}
location / {
root /usr/share/nginx/html;
index index.html;
}
location /protected {
auth_request /sso/validate;
root /usr/share/nginx/html;
index index.html;
expires 0;
add_header Cache-Control "no-cache, no-store, must-revalidate, max-age=0";
add_header Pragma "no-cache";
error_page 401 = @prompt_login;
}
location @prompt_login {
return 302 http://localhost/sso/login?url=$scheme://$http_host$request_uri;
}
}
view raw server.conf hosted with ❤ by GitHub
updated nginx server config

Now, the user will be asked to log in if they try to access the protected location, and only if the login succeeds will they be able to view the protected page.

Bonus: Add HTTPS via self-signed certificates

In 2020 servers should enforce https, and while it is not necessary for localhost development, it is very nice to have. Especially later, when you have a development and production version of the code.

Adding SSH is very easy (shameless self plug). First generate a self-signed certificate for localhost (make sure to enter localhost as common name):

docker run --rm -it -v$PWD:/certs firefoxmetzger/create_localhost_ssl

(Source: https://github.com/FirefoxMetzger/create_ssl)

This will place a certificate and a private key into your current working directory which you can move to ./cert/ . Also, if you don’t want to be warned about an untrusted certificate, you can consider adding it to your browser’s trusted certificates.

Next, we have to update the server.conf for nginx

server {
listen 80;
server_name _;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl;
server_name localhost;
ssl_certificate /certs/certificate.crt;
ssl_certificate_key /certs/private.key;
location ^~ /sso/ {
location /sso/validate {
proxy_pass http://vouch:9090/validate;
proxy_set_header Host $http_host;
proxy_pass_request_body off;
}
location = /sso/logout {
proxy_pass http://vouch:9090/logout?url=https://your-project-name-here.eu.auth0.com/v2/logout?client_id={client_id_from_auth0}%26returnTo=https://localhost/;
proxy_set_header Host $http_host;
}
proxy_set_header Host $http_host;
proxy_pass http://vouch:9090/;
}
# uncomment this to forward static content of vouch-proxy
# used when running vouch-proxy with `testing: true`
location /static/ {
proxy_set_header Host $http_host;
proxy_pass http://vouch:9090/static/;
}
location / {
root /usr/share/nginx/html;
index index.html;
}
location /protected {
auth_request /sso/validate;
root /usr/share/nginx/html;
index index.html;
expires 0;
add_header Cache-Control "no-cache, no-store, must-revalidate, max-age=0";
add_header Pragma "no-cache";
error_page 401 = @prompt_login;
}
location @prompt_login {
return 302 https://localhost/sso/login?url=$scheme://$http_host$request_uri;
}
}
view raw server.conf hosted with ❤ by GitHub
updated server.conf

The new first server block will forward all HTTP requests to HTTPS. Then, we add the SSL certificate we have just generated and change nginx to listen to the standard HTTPS port. Finally, we change the protocol from HTTP to HTTPS for both redirects.

Next update the callback_url for the vouch-proxy config as well as the post_logout_redirect_uris.

# vouch config
# bare minimum to get vouch running with OpenID Connect (such as okta)
vouch:
logLevel: debug
testing: true
listen: 0.0.0.0
port: 9090
allowAllUsers: true
jwt:
secret: Your-64-character-secret-key-here
issuer: Vouch
compress: false
cookie:
name: my-vouch-ct
secure: false
domain: localhost
headers:
jwt: X-Vouch-Token
querystring: access_token
redirect: X-Vouch-Requested-URI
accesstoken: X-Vouch-IdP-AccessToken
idtoken: X-Vouch-IdP-IdToken
post_logout_redirect_uris:
https://your-project-name-here.eu.auth0.com/v2/logout?client_id={client_id_from_auth0}&returnTo=https://localhost/
oauth:
# Generic OpenID Connect
# including okta
provider: oidc
client_id: {client_id_from_auth0}
client_secret: {client_secret_from_auth0}
auth_url: https://your-project-name-here.eu.auth0.com/authorize
token_url: https://your-project-name-here.eu.auth0.com/oauth/token
user_info_url: https://your-project-name-here.eu.auth0.com/userinfo
scopes:
openid
email
profile
callback_url: https://localhost/sso/auth
view raw config.yml hosted with ❤ by GitHub
updated conf.yml

Last, but not least, make the certificates available to nginx, by mounting the folder into the nginx container and open port 433 to allow SSL connections.

version: '3'
services:
vouch:
image: voucher/vouch-proxy
volumes:
./vouch/config/config.yml:/config/config.yml
nginx:
image: nginx
depends_on:
vouch
volumes:
./nginx/conf.d/:/etc/nginx/conf.d/
./web/:/usr/share/nginx/html/
./cert/:/certs
ports:
443:443
80:80
updated docker-compose.yml

Now, when you update the docker stack, you should be able to navigate to https://localhost (potentially receive a warning that the certificate could not be verified), and browse your app encrypted.

Remove Debugging

The only thing left is to remove some of the config that we have introduced for debugging purposes.

First, in the settings for the Tenant at Auth0, remove the uneeded allowed callback and logout URLs

Final Auth0 settings

Then, disable debug logs and testing mode in vouch-proxy.

# vouch config
# bare minimum to get vouch running with OpenID Connect (such as okta)
vouch:
listen: 0.0.0.0
port: 9090
allowAllUsers: true
jwt:
secret: Your-64-character-secret-key-here
issuer: Vouch
compress: false
cookie:
name: my-vouch-ct
secure: false
domain: localhost
headers:
jwt: X-Vouch-Token
querystring: access_token
redirect: X-Vouch-Requested-URI
accesstoken: X-Vouch-IdP-AccessToken
idtoken: X-Vouch-IdP-IdToken
post_logout_redirect_uris:
https://your-project-name-here.eu.auth0.com/v2/logout?client_id={client_id_from_auth0}&returnTo=https://localhost/
oauth:
# Generic OpenID Connect
# including okta
provider: oidc
client_id: {client_id_from_auth0}
client_secret: {client_secret_from_auth0}
auth_url: https://your-project-name-here.eu.auth0.com/authorize
token_url: https://your-project-name-here.eu.auth0.com/oauth/token
user_info_url: https://your-project-name-here.eu.auth0.com/userinfo
scopes:
openid
email
profile
callback_url: https://localhost/sso/auth
view raw config.yml hosted with ❤ by GitHub
final config.yml for vouch-proxy

And update the nginx config by removing the /static/ route.

server {
listen 80;
server_name _;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl;
server_name localhost;
ssl_certificate /certs/certificate.crt;
ssl_certificate_key /certs/private.key;
location ^~ /sso/ {
location /sso/validate {
proxy_pass http://vouch:9090/validate;
proxy_set_header Host $http_host;
proxy_pass_request_body off;
}
location = /sso/logout {
proxy_pass http://vouch:9090/logout?url=https://your-project-name-here.eu.auth0.com/v2/logout?client_id={client_id_from_auth0}%26returnTo=https://localhost/;
proxy_set_header Host $http_host;
}
proxy_set_header Host $http_host;
proxy_pass http://vouch:9090/;
}
# uncomment this to forward static content of vouch-proxy
# used when running vouch-proxy with `testing: true`
#location /static/ {
# proxy_set_header Host $http_host;
# proxy_pass http://vouch:9090/static/;
#}
location / {
root /usr/share/nginx/html;
index index.html;
}
location /protected {
auth_request /sso/validate;
root /usr/share/nginx/html;
index index.html;
expires 0;
add_header Cache-Control "no-cache, no-store, must-revalidate, max-age=0";
add_header Pragma "no-cache";
error_page 401 = @prompt_login;
}
location @prompt_login {
return 302 https://localhost/sso/login?url=$scheme://$http_host$request_uri;
}
}
view raw server.conf hosted with ❤ by GitHub
final server.conf for nginx

Done! Now you have a simple app that is secured with Auth0 and vouch-proxy. You can add an API to this in the same way we have added the /protected route. Simply add

auth_request /sso/validate;

to the route that proxies the API.

If you have any questions or comments, feel free to comment below.

Thanks for reading and Happy Coding!

Related Useful Links

Faster RNG for Reinforcement Learning in Python

When you start doing reinforcement learning you will sooner or later come to the point where you will generate random numbers. Initializing policy networks or Q-tables, choosing between exploration or exploitation, or selecting among equally good actions are a few examples. Numpy is very efficient at generating those random numbers, and most of the time (like 95%) this is all you need. However, there is one particular edge case were numpy is not the best solution, and that is exactly the case we encounter in RL a lot: generating single random numbers (i.e., to select an action epsilon-greedy).

Generating single random numbers in numpy is a bad idea, because every numpy call gets sent to the numpy engine and then back to python, which creates overhead that dominates runtime for single random numbers. In this case it is (much) more efficient to use the python standard library instead. However, if you can generate the random numbers in batches, numpy is significantly faster than the standard library again. Take a look at this simple profile:

import timeit
number = 10000
numpy_time = timeit.timeit("[np.random.rand() for _ in range(int(1e3))]", "import numpy as np", number=number)
random_time = timeit.timeit("[random.random() for _ in range(int(1e3))]", "import random", number=number)
numpy_batch_time = timeit.timeit("np.random.rand(int(1e3))", "import numpy as np", number=number)
print("Timings")
print("=======")
print(f"Numpy Single: {numpy_time:.3f}")
print(f"Random: {random_time:.3f}")
print(f"Numpy Batch: {numpy_batch_time:.3f}")
print("=======")
# =======
# Numpy Single: 3.245
# Random: 1.003
# Numpy Batch: 0.085
# =======
Comparison between ways to generate random numbers

So if you do your profiling your code and notice that RNG adds up to a significant portion of your runtime, consider pre-generating the random numbers in numpy and then save them to a list. This solution sacrifices a bit of readability, but allows for much faster code. Here is an example that mimics the syntax of the python standard library:

import numpy as np
random_numbers = np.random.rand(int(2e8)).tolist()
def random():
try:
return random_numbers.pop()
except IndexError:
raise IndexError("Out of random numbers; generate more next time.")
def randint(a, b):
return int(round((ba) * random())) + a
view raw RandomDropin.py hosted with ❤ by GitHub
Source Code for Random replacement.

That’s it. Thanks for reading and Happy Coding!

Parallelism in Python Generators

Yesterday I stumbled upon a StackOverflow question that asked about implementing a Rosetta Code problem in parallel to speed it up. One easy way to do it is one, which is a modification of the python example on rosettacode.org.

from multiprocessing import Pool, cpu_count
from itertools import islice, count
def is_special(n, d):
tofind = str(d) * d
return tofind in str(d * n ** d)
def superd(d, N=10000):
if d != int(d) or not 2 <= d <= 9:
raise ValueError("argument must be integer from 2 to 9 inclusive")
with Pool(cpu_count() 2) as workers:
for offset in count(0, N):
worker_fn_args = zip(range(offset, offset + N), [d] * N)
is_superd_batch = workers.starmap(is_special, worker_fn_args)
yield from [n+offset for n in range(N) if is_superd_batch[n]]
if __name__ == '__main__':
for d in range(2, 10):
print(f"{d}:", ', '.join(str(n) for n in islice(superd(d), 10)))
view raw superd.py hosted with ❤ by GitHub
Source code for parallelism in generators

When I posted the question the OP commented that he/she was looking for using non-terminal sub-processes that yield these super-d values. I thought about it, but that version does not seem very practical. If the main process is interested in the results of the computation, then temporary concurrency will be the cleaner solution (like in this example). If the main thread isn’t, e.g., if you are running an old-school threaded web-server, that hands off incoming connections to a worker thread, then solutions with non-terminal sub-processes can make sense. In the latter case you are essentially “starting a program N times in parallel and shutting them down together”. This certainly makes sense, but only if those programs don’t need to communicate. Remember to KISS.

Thank you for reading and Happy Coding!

How to play custom animations during speech on a NAO (or Pepper)

I’ve been asked multiple times now how to sync animations and speech on a NAO – or Pepper for that matter; especially from Python.

The answer to that is, there are two options:

  1. The first one is to create the animation in Choreograph and then export it to a python script. You then create your usual handle to the text-to-speech module, but instead of calling the say method directly, e.g., `tts.say(“Hello”)`, you call it through the module’s `post` method, e.g., tts.post.say(“Hello”). This method exists for every function in the API and essentially just makes a non-blocking call. You can then call your animation.
  2. You create a custom animation in Choreograph, upload it to the robot, and call it through AnimatedSay or QiChat. Other than being the, I think, cleaner solution, it allows you more fine grained control over when in the sentence the animation starts and when it should stop. This is what I will describe in more detail below.

Step 1: Create the Animation

timeline

Fairly straight forward, and the same for both solutions. You use Choreograph to create a new Timeline box in which you create the animation that you would like. You then connect the timeline box to the input and output of the behavior and make sure it works as you’d expect when you press the green play button.

Step 2: Configure the Project and Upload it to the Robot

In this step, you configure the new animation to be deployed as an app on the robot.

properties-choreography

Go to the properties of the project.

properties-configured

Then make sure to select a minimum naoqi version (for NAO 2.1, for Pepper 2.5), the supported models (usually any model of either NAO or Pepper respectively) and set the ID of the Application. We will use this when calling the animations, so choose something snappy, yet memorable. Finally, it is always nice to add a small Description.

project_structure

Next, we need to reorganize the app a bit. Create a new folder and name it after your animation; again, we will use this name to call our behavior, so make sure it’s descriptive. Then move the behavior that contains your animation – by default called behavior1.xar – into the folder you just created, and rename it to behavior.xar .

buttons_upload

Finally, connect to your robot and use the first button in the bottom right corner to upload the app you just created to your robot.

Step 3: Use ALAnimatedSpeech from Python

Note: If you don’t want NAO to use the random gestures it typically uses when speaking in animated speech, consider setting the BodyLanguageMode to disabled. You can still play animations, but it won’t automatically start any.

For existing animations – that come with the robot by default – you call the animation like this

"Hello! ^start(animations/Stand/Gestures/Hey_1) Nice to meet you!"

Now, animations is nothing but an app that is installed on the robot. You can even see listed it in the bottom right corner of Choreograph. Inside the app, there are folders for the different stable poses of NAO like Stand, or Sit, which are again divided into types of animations, e.g., Gestures which you can see above. Inside these folders there is, yet another, folder named after the animation (Hey_1), inside of which is a behavior file called behavior.xar.

We have essentially recreated this structure in our own app and installed it right next to the animations app. So, we can call our own animations using the exact same logic:

"Hello! ^start(pacakge_name/animation_name) Nice to meet you!"

It also works with all the other aspects of the ALAnimatedSpeech module, so ^stop, ^wait, ^run, will work just as fine. You can also assign tags to your animations and then make it choose random animations for that tag group.

Finally, please be aware that the robot will return to it’s last specified pose after finishing an animation. Hence, if you want the robot to wait in a different position after the animation finished, you will have to do that by creating a custom posture. I have some comments on that here: The hidden potential of NAO and Pepper – Custom Robot Postures in naoqi v2.4

I hope this will be useful to some of you. Please feel free to like, share, or leave a comment below.

Happy coding!

The hidden potential of NAO and Pepper – Custom Robot Postures in naoqi v2.4

Introduction

Our lab owns robots build by SoftBank that we use for experiments; we have a Pepper and some NAOs. At the moment, I’m working on a NAO.

naw_low_quality
NAO looking at a Tower of Hanoi

They are quite pretty robots. I mean, they can barely walk around, let alone navigate the environment, they can’t do proper grasping, the build-in CPU is so slow and hogged by the default modules, and streaming video from the robot for remote processing happens at about 5 FPS. So you can’t really do any of the things you would expect you can do, but hey, they look really cool 😀

Okay, jokes aside, the manufacturing of the robots is actually pretty solid. Being able to get your hands on a biped for about 6000€ is solid, and, despite some stability issues, it can walk – however, nobody really uses that feature in social robotics research. They also come with a huge sensor array, that makes every smartphone jealous. Hardware wise both, NAO and Pepper, are good robots.

The thing that is lacking – by a landslide – is the software. The robots come with an API, but that API is proprietary – in itself, not a problem. The problem starts where the documentation ends. Documentation is shaky, disorganized, not very clear, and – for all the cool parts – nonexistent. In short, you don’t get to read the code and you don’t get good documentation to help you either; hence, if something breaks, you are blind and deaf somewhere in the forest of code and have to find the way out yourself.

Pepper can grasp, it can do navigation, and you can stream video data at a decent FPS – the same is true for NAO; it can do all the things I just complained about. You just have to write the code yourself.

This is what I will talk about in this post. I will not go into grasping or walking, but we will look into navigating the joint space more efficiently. That is, we will have a more in-depth look at ALRobotPosture, some of the hidden / undocumented functions, and how we can use this module for some pretty sick motion planning.

Note: Everything in this post works for both NAO and Pepper. For ease of reading, I will only reference the NAO, because – I think – that is the robot most people reading this will own.

ALRobotPosture, an Overview

If you own either a NAO or a Pepper, you have probably noticed that, when you turn (and autonomous life activates) it on, it moves into a certain pose. For Pepper, it is always the same, for the NAO, it depends if it was turned on sitting or standing. This is RobotPosture in action. The same is true after we play an animation. Once it finishes, NAO moves back into a specific pose, waiting for the next command.

This is the most visible action of the module. When no other movement task is running, it will move the NAO into a stable position. The other thing it does, is it transitions between these stable poses. For example, when you want NAO to either sit down or stand up, then it doesn’t play an animation. It actually uses RobotPosture to navigate the joint space from one stable posture to another until it reaches the Sit or Stand posture respectively.

In essence, RobotPosture is a list of configurations – points in joint space – that serve as stable positions the robot can move into. These points are connected; there is a neighbor relationship between them. They are also attractive; hence, when no other motion is running, NAO will move into the closest posture (closeness being defined as closeness in joint space).

The interesting part is that movement between poses is not done as a direct line in joint space. This could be rather dangerous, since the robot would just fall, if it would move in a straight line from the Stand to Sit. Instead, planning is done in the topological map – a directed graph -, that is defined by the poses and their neighbors. NAO then moves in a (joint space) direct line from the current pose to a neighboring pose and goes through different poses until it reaches the final, desired pose.

I visualized the standard poses in Figure 1. Additionally, there is the USRLookAtTower pose, which is a custom posture I’ve added for a project I’m working on. You can also see it in the picture I chose for the beginning of the post. It looks a lot like normal sitting, but the head is tilted downwards. I will walk you through how I did that in the next section. I also color coded the sitting and standing postures, because they are the most used – but mainly because it looks nice 🙂 .

PostureVisualization.png
Figure 1: Visualization of the available postures on a NAO v4. The distances between nodes roughly correspond to the euclidean distance in joint space. Big nodes are targets for ALRobotPosture.goToPosture(), small nodes are used for the transition. Blue nodes belong to the family Sitting and red nodes belong to the family Standing.

The graph is laid out using force-directed graph drawing, where the force corresponds to the euclidean distance between nodes in joint space. However, I took a bit of liberty to prevent label overlap. As you can see, there is no direct connection between Sit and Stand; the robot would move through unstable territory. (We could, however, add such trajectories ourselves, creating a fast, dynamic stand up motion – e.g., for robot football.)

Another advantage of this approach is that it is very computationally efficient. Since we have an abstract map of how poses are connected, we can quickly figure out if a pose is reachable, and compute a path to that given pose.

Enough theory, show some code already! Okay … okay. Here is how to use the basics of the module:


from naoqi import ALBroker
from naoqi import ALProxy
# start a local broker that connects to the NAO
robot_ip = "your-ip-here"
myBroker = ALBroker("myBroker", "0.0.0.0", 0, robot_ip, 9559)
# get a handle to the module
posture_proxy = ALProxy("ALRobotPosture")
tts_proxy = ALProxy("ALTextToSpeech")
announcement = "I am in posture {posture}. It is part of {posture_family}."
# list current postures
postures = posture_proxy.getPostureList()
print(postures)
# round-trip through all available postures
for posture in postures:
posture_proxy.goToPosture(posture, 1.0)
posture_family = posture_proxy.getPostureFamily()
# not needed, just for demonstration
posture_name = posture_proxy.getPosture()
tts_proxy.say(announcement.format(posture=posture_name,
posture_family=posture_family))

view raw

basics.py

hosted with ❤ by GitHub

The snippet will make the robot run through all the available poses and announce the pose’s name once there. This is about the best you can do with the official part of ALRobotPosture; not that much.

Hidden Features

There is a lot more functionality in the module. There just isn’t any documentation of it on the web. We can look at all the methods in a module via:


from naoqi import ALBroker
from naoqi import ALProxy
from pprint import pprint
# start a local broker that connects to the NAO
robot_ip = "your-ip-here"
myBroker = ALBroker("myBroker", "0.0.0.0", 0, robot_ip, 9559)
# get a handle to the module
posture_proxy = ALProxy("ALRobotPosture")
pprint(posture_proxy.getMethodList())

Alternatively we can use qicli (with the parameter –hidden) to list all the functions in a similar fashion. Qicli is documented here.

Here we can find a few very promising functions:

_isRobotInPosture(string, float, float)

This function is similar to getRobotPosture(). However, instead of giving the current posture, it gives a boolean that is true if the robot is in the given posture. The two floats are threshold values for the joint angles and stiffness. That is, by how much is the current pose allowed to deviate from the defined pose for us to consider them the same.

It returns a triple of (bool, [bool] * 26, [bool] * 2) on a NAO robot. The first boolean tells us if the pose has been reached overall, the second is a breakdown if the pose has been reached for each joint. Finally, the last array is the same for stiffness.

This function is useful if two poses are close together. In this case getRobotPosture() may not show the correct pose; however, we can still differentiate with _isRobotInPosture().

_loadPostureLibraryFromName(string)

Make your own network of postures, export it, use this to upload it to an army of NAOs, and dominate the world.

Given a serialized graph of poses, it will load it and replace the current posture graph. The string is the (relative) path to the file. It returns a boolean indicating if the loading has succeeded.

Important: The file path is relative to ~/.local/share/naoqi/robot_posture on the robot, so the posture file has to be stored in that directory on the robot.

_generateCartesianMap()

This is a strange one. While not immediately useful to us, it will re-generate a Cartesian map that the module uses internally to navigate between poses. You have to call this after loading a new posture library or adding individual postures. Otherwise the new postures won’t work!

_getIdFromName(string)

Pretty self explanatory. Look up the id of the posture using it’s name. Takes the name of the posture and returns an integer that is the id.

_isReachable(int)

Takes a posture id and returns a boolean. True if the posture is reachable from the current robot pose.

__savePostureLibrary(string)

The string is the name that we want to save the file as and it will be saved under ~/.local/share/naoqi/robot_posture on the robot.

_addNeighbourToPosture(int, int, float)

Adds a vertex to the graph pointing from the first posture (indexed by the first int) to the second posture. The third value is the cost of traversing along this edge, which can be used for more sophisticated path planning.

_saveCurrentPostureWithName(int, string)

Saves the current pose as an edge with ID int and name string.

 

Custom Poses

Putting all these together, we can create custom poses as follows:

  1. Use the Animation Mode (or any other method) to move NAO into the desired pose
  2. _saveCurrentPostureWithName() to add the node to the graph
  3. _addNeighbourToPosture() to connect it to the graph (edges are directed! we have to add both ways)
  4. export the postures via _savePostureLibrary() (this will save the file in the correct place)
  5. In our code: import our custom poses using _loadPostureLibraryFromName()
  6. re-generate the cartesian map _generateCartesianMap()
  7. goToPosture()

Here is a code snippet that adds a custom posture called “myPosture”, exports the library, imports it, makes the robot sit down, and then go into “myPosture”.


from naoqi import ALBroker
from naoqi import ALProxy
from pprint import pprint
# start a local broker that connects to the NAO
robot_ip = "love"
myBroker = ALBroker("myBroker", "0.0.0.0", 0, robot_ip, 9559)
# get a handle to the module
posture_proxy = ALProxy("ALRobotPosture")
file_name = "firefoxmetzger-awesome-pose.pose"
posture_proxy._saveCurrentPostureWithName(9942, "myPosture")
custom_posture_id = 9942
stand_posture_id = posture_proxy._getIdFromName("Stand")
posture_proxy._addNeighbourToPosture(stand_posture_id, custom_posture_id, 1)
posture_proxy._addNeighbourToPosture(custom_posture_id, stand_posture_id, 1)
posture_proxy._savePostureLibrary(file_name)
posture_proxy._loadPostureLibraryFromName(file_name)
posture_proxy._generateCartesianMap()
posture_proxy.goToPosture("Sit", 0.5)
posture_proxy.goToPosture("myPosture", 0.5)

And just for good measure, a video showing what the robot does when running the snippet:

Naturally, you can be more fancy with this. I am particularly excited about the possibility to do dynamic movements, i.e., one-way trajectories. However, my supervisor will probably kill me if I actually dabble in this area, because the chances of breaking a NAO like this are … elevated.

I hope this article is useful. If you liked it, feel free to leave a like, comment, or follow this blog! I will keep posting tutorials in the area of robotics, AI, and social robotics research.

Happy coding!

Questback + MTurk — No more survey codes with ExternalQuestions

Introduction

A few weeks ago, I ran my first pilot study on Amazon Mechanical Turk (MTurk).

Essentially, MTurk is a platform that you can use to label data and get participants for studies (using money). There is of course more that can be done here, but from my current understanding, these seem to be the two main uses in academia.

The completion of a survey usually takes the following form:

  1. A worker / participant on MTurk accepts the work and is presented a link to the survey.
  2. The worker copies his unique, anonymized workerID into the survey (so that you can reference the data)
  3. The worker completes the survey and is presented a unique random code at the end
  4. The worker copies the code into a form on MTurk’s website.
  5. You match the IDs of workers that completed the survey with IDs in your database and reward workers for their time

You used MTurk before? This sounds familiar? Yes – however, two parts in this chain are rather weak: (1) the manual copying of the workerID and (2) the generation, matching, and copying of the random code at the end. If either of these fails, you have to work out manually if a worker has participated or not.

Principle

As you have already guessed, there is a better way to do this (why else would I be writing about this? :D). MTurk offers to host a so called ExternalQuestion, which allows you to embed a custom website via an <iframe>. On top, it passes some meta information, such as the workerID, to the website; all you need to do is read that info and use the value as you see fit. Additionally, it allows the website to submit a form back to MTurk as proof of having completed the work, which we will do at the end of the survey.

In short, we integrate the survey directly into MTurk thereby getting rid of above pitfalls. It roughly follows these steps:

  1. Create the Survey on the survey tool (in this case Questback)
  2. Create a small adapter website on GitHub (may not be needed if you have a different survey tool)
    • this will accept and forward requests from Mturk
    • allow you to create an arbitrary preview of the survey
    • pipe the form back to MTurk when the survey is finished
  3. Add URL parameters to the survey
  4. forward people to the adapter website upon completion
  5. Host an external question pointing to the adapter website on GitHub

Small Adapter Website on GitHub

At first, I tried to link Questback and MTurk directly; then I discovered two limitations making this impossible: (1) Questback only accepts URL parameters named “a=..&b=..&c=..” instead of full variable names, and (2) Questback can not post the results of a form; I could only find forwarding via GET.

Hence, I set up a small website hosted on GitHub to do the plumbing between both websites.


<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Redirecting to the Survey …</title>
<!–[if IE]>
<script src="http://html5shiv.googlecode.com/svn/trunk/html5.js"></script&gt;
<![endif]–>
</head>
<body id="home">
<img src="PreviewQuestionnaire.png" />
<script>
let survey_url = 'https://www.unipark.de/uc/robot_capabilities/&#39;
let plain_url = 'https://firefoxmetzger.github.io/robot_capabilities_glue/&#39;
let mturk_url = 'https://www.mturk.com/mturk/externalSubmit&#39;
// let mturk_url = 'https://workersandbox.mturk.com/mturk/externalSubmit&#39;
if (RegExp("returnToMTurk").test(window.location.href)) {
console.log(window.location.href);
var match = RegExp("assignmentId=(.*?)&").exec(window.location.href);
var attention_response = RegExp("attentionCheckPassed=(.*?)&").exec(window.location.href);
var form = document.createElement('form');
document.body.appendChild(form);
form.method = 'post';
form.action = mturk_url + '?assignmentId=' + match[1] + '&attentionCheckPassed=' + attention_response[1];
console.log(form.action)
form.submit();
} else if (RegExp("ASSIGNMENT_ID_NOT_AVAILABLE").test(window.location.href)) {
// do nothing
} else {
let url = window.location.href;
url = url.replace('assignmentId','a');
url = url.replace('hitId','b');
url = url.replace('turkSubmitTo','c');
url = url.replace('workerId','d');
url = url.replace(plain_url,survey_url);
window.location.href = url;
}
</script>
</body>
</html>

view raw

index.html

hosted with ❤ by GitHub

Note: If you are working with the sandbox, you have to change the mturk_url appropriately.

This does 3 things, depending on how it is called:

  1. if the URL contains a parameter called “returnToMTurk” then we assume that the Questback is forwarding to this website via GET. In this case we take the payload (in this case attentionCheckPassed) and forward it to MTurk as a POST request.
  2. if the URL contains ASSIGNMENT_ID_NOT_AVAILABLE it means that the survey is being previewed. In this case we do nothing and show this website, which will act as the preview of the survey (e.g. display a screenshot of the hit or some other, relevant information to inform people what this task is about). Note that you want to avoid people submitting your survey at this stage, so showing them the raw survey may be counterproductive at this stage.
  3. Otherwise the website is called from MTurk by a worker who wants to complete the survey; in this case we rename the URL parameters from their actual names into a, b, c, d and forward the request to the Questback survey.

Add URL parameters to the survey

In Questback this can be done in the survey properties > User-defined variables.

Steps

As mentioned before, the URL parameters will have names a,b,c, … and can be accessed in the survey tool via #p_0001#, #p_0002#, #p_0003#, … .

Forward People to the GitHub Adapter Upon Completion

This is easily done in the properties section of the final page under Questionnaire editor > Final Page > Properties > Redirect to Survey .

URL forwarding questback

The address should point to the GitHub Adapter and the URL should include three parameters: (1) the assignmentID sent from MTurk, (2)-this is really important– at least one additional parameter to store in MTurk as the result of the task and (3) the “returnToMTurk” parameter used to tell the adapter what to do. An example of a URL could look like this

https://firefoxmetzger.github.io/robot_capabilities_glue/?assignmentId=#p_0001#&attentionCheckPassed=False&returnToMTurk=True

Note that the assignmentID is set to #p_0001# which is the the first parameter (“a“) passed to the survey from MTurk. In above example attentionCheckPassed is a variable from the survey which we used to determine if participants payed attention or just mindlessly filled out the survey. This is an aggregate of multiple questions and computed by the survey tool upon completion of the survey. This can later be used to automatically accept / reject / ban workers that have completed the assignment.

It is also important to note that MTurk expects the assignmentID and at least one additional parameter to be send through the form’s POST request. For some reason, the additional parameter is mentioned nowhere in the documentation, but, instead, tacitly assumed.

Additionally, the checkbox next to the phrase Automatically add ospe.php3 to URL  and Add return ticket have to be disabled. You would use these if you were forwarding / returning to another ESF survey; this isn’t the case here.

Host an External Question Pointing to the GitHub Adapter

All that is left is to actually host a task on MTurk. In this case an External Question.

Unfortunately, this is currently impossible to do through the web UI. Hence, we have to use the API; I decided to do it in Python by adapting a code snippet that I found on the web. It consists of two files: (1) config.py, which stores the credentials, and (2) create_hit.py which creates the actual hit.


AWS_ACCESS_KEY_ID = "YOUR KEY ID HERE"
AWS_SECRET_ACCESS_KEY = "YOUR SECRET KEY HERE"
USE_SANDBOX = False

view raw

config.py

hosted with ❤ by GitHub


import config
from boto.mturk.connection import MTurkConnection
from boto.mturk.question import ExternalQuestion
from boto.mturk.qualification import (Qualifications,
PercentAssignmentsApprovedRequirement,
NumberHitsApprovedRequirement, LocaleRequirement)
from boto.mturk.price import Price
import datetime
# ============================HELPER METHODS=======================
# Quick method to encode url parameters
def encode_get_parameters(baseurl, arg_dict):
queryString = baseurl + "?"
for indx, key in enumerate(arg_dict):
queryString += str(key) + "=" + str(arg_dict[key])
if indx < len(arg_dict)1:
queryString += "&"
return queryString
# ============================VARIABLES============================
# START AWS CONFIGURATION VARS
AWS_ACCESS_KEY_ID = config.AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY = config.AWS_SECRET_ACCESS_KEY
# END MAIN CONFIGURATION VARS
# START IMPORTANT HIT VARIABLES
sandbox = config.USE_SANDBOX
base_url = "https://firefoxmetzger.github.io/robot_capabilities_glue/&quot;
params_to_encode = {}
assignments_per_hit = 25
payment_per_assignment = 0.8
# END IMPORTANT HIT VARIABLES
# START QUALIFICATION CONFIGURATION
qualifications = Qualifications()
qual_1 = PercentAssignmentsApprovedRequirement(
comparator="GreaterThan",
integer_value="95")
qualifications.add(qual_1)
qual_2 = NumberHitsApprovedRequirement(
comparator="GreaterThan",
integer_value="100")
qualifications.add(qual_2)
qual_3 = LocaleRequirement(
comparator="In",
locale=["US", "GB", "CA", "AU", "NZ"])
qualifications.add(qual_3)
# END QUALIFICATION CONFIGURATION
# START DECORATIVE HIT VARIABLES
hit_title = "Evaluate Robot Capabilities! (6 minutes / $0.8 USD)"
hit_description = "Rate a robot on a number of capabilities in a short survey (approximately 6 minutes)"
hit_keywords = ["university study", "robot", "survey", "short", "quick"]
duration_in_seconds = 20*60
frame_height = 1200
# END DECORATIVE HIT VARIABLES
# =================================================================
# Initialize boto connection based on sandbox.
if sandbox:
AMAZON_HOST = "mechanicalturk.sandbox.amazonaws.com"
else:
AMAZON_HOST = "mechanicalturk.amazonaws.com"
connection = MTurkConnection(
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
host=AMAZON_HOST)
# Selecting which endpoint to pass as parameter
if sandbox:
external_submit_endpoint = "https://workersandbox.mturk.com/mturk/externalSubmit&quot;
else:
external_submit_endpoint = "https://www.mturk.com/mturk/externalSubmit&quot;
#params_to_encode['host'] = external_submit_endpoint
encoded_url = encode_get_parameters(base_url, params_to_encode)
create_hit_result = connection.create_hit(
title=hit_title,
description=hit_description,
keywords=hit_keywords,
duration=duration_in_seconds,
lifetime=datetime.timedelta(days=8),
max_assignments=assignments_per_hit,
question=ExternalQuestion(encoded_url, frame_height),
reward=Price(amount=payment_per_assignment),
# Determines information returned by certain API methods.
response_groups=('Minimal', 'HITDetail'),
qualifications=qualifications)

view raw

create_hit.py

hosted with ❤ by GitHub

That’s it. Running the code will create a HIT that will point to the website on GitHub, which itself points to our survey. The survey will point back to the GitHub website, which will point back to MTurk, going full circle. Neat!

As always, I hope this is useful to some of you and feel free to drop a comment or reach out to me if you have questions.