Download Folders from a GitHub Repo using Python (… files, too)

At some point during your life as a programmer, you will end up with the problem of downloading a single folder from a GitHub repository. For me, it was because I had an auxiliary repo containing files relevant to some unit tests. To solve this problem, I found a very elegant (and super easy) solution, which I want to share here today. In other words, this is a short how-to on how to download/copy files and folders from GitHub using python.

The solution uses the awesome fsspec library. fsspec is a pythonic approach to filesystem management, and allows you to use python to access data in various kinds of locations: on your local machine, on all major cloud providers, and – most importantly – on GitHub. There are many more locations, so the library is worth checking out if you have the time.

Installation

To get started, install fsspec. (Chances are you already have it, because increasing parts of the pydata ecosystem use it internally.)

pip install fsspec

Copy a Folders

import fsspec
from pathlib import Path
# flat copy
destination = Path.home() / "test_folder_copy"
destination.mkdir(exist_ok=True, parents=True)
fs = fsspec.filesystem("github", org="githubtraining", repo="hellogitworld")
fs.get(fs.ls("src/"), destination.as_posix())
# recursive copy
destination = Path.home() / "test_recursive_folder_copy"
destination.mkdir(exist_ok=True, parents=True)
fs = fsspec.filesystem("github", org="githubtraining", repo="hellogitworld")
fs.get(fs.ls("src/"), destination.as_posix(), recursive=True)
view raw folder.py hosted with ❤ by GitHub
Example of how to download a folder from GitHub (shallow or recursive).

The above snippet does the following: We first declare a destination (where to store the folder’s content). Then we use fsspec to turn the repo into a pythonic filesystem. Finally, we list all the files in the target folder of the repo (fs.ls(…)) and download them all using fs.get. Simple, elegant, and convenient. I love it!

Copy Files

Copying/Downloading individual files works the same way; however, this time the destination has to be a file.

import fsspec
from pathlib import Path
# download a single file
destination = Path.home() / "downloaded_readme.txt"
fs = fsspec.filesystem("github", org="githubtraining", repo="hellogitworld")
fs.get("README.txt", destination.as_posix())
view raw copy_file.py hosted with ❤ by GitHub
Example of how to download a single file from GitHub.

And that is all there is to it. Happy coding!

2 thoughts on “Download Folders from a GitHub Repo using Python (… files, too)

Leave a comment