Installing Python, Keras, and Tensorflow from source
I found myself in the interesting position recently of needing to compile Python from source. The reasoning behind this is complicated, but it boils down to a need to use Python with Tensorflow / Keras for some natural language processing AI, as Tensorflow.js isn't going to cut it for the next stage of my PhD.
The target upon which I'm aiming to be running things currently is Viper, my University's high-performance computer (HPC). Unfortunately, the version of Python on said HPC is rather old, which necessitated obtaining a later version. Since I obviously don't have
sudo permissions on Viper, I couldn't use the default system package manager. Incredibly, pre-compiled Python binaries are not distributed for Linux either, which meant that I ended up compiling from source.
I am going to be assuming that you have a directory at
$HOME/software in which we will be working. In there, there should be a number of subdirectories:
bin: For binaries, already added to your
lib: For library files - we'll be configuring this correctly in this guide
repos: For git repositories we clone
Make sure you have your snacks - this was a long ride to figure out and write - and it's an equally long ride to follow. I recommend reading this all the way through before actually executing anything to get an overall idea as to the process you'll be following and the assumptions I've made to keep this post a reasonable length.
Before we begin, we need some dependencies:
gcc - The compiler
git - For checking out the cpython git repository
readline - An optional dependency of cpython (presumably for the REPL)
On Viper, we can load these like so:
module load utilities/multi
module load gcc/10.2.0
module load readline/7.0
We also need to clone the
openssl git repo and build it from source:
git clone git://git.openssl.org/openssl.git; # Clone the git repo
cd openssl; # cd into it
git checkout OpenSSL_1_1_1-stable; # Checkout the latest stable branch (do git branch -a to list all branches; Python will complain at you during build if you choose the wrong one and tell you what versions it supports)
./config; # Configure openssl ready for compilation
make -j "$(nproc)" # Build openssl
With openssl compiled, we need to copy the resulting binaries to our
cp lib*.so* ~/software/lib;
# We're done, cd back to the parent directory
To finish up openssl, we need to update some environment variables to let the C++ compiler and linker know about it, but we'll talk about those after dealing with another dependency that Python requires.
libffi is another dependency of Python that's needed if you want to use Tensorflow. To start, go to the libgffi GitHub releases page in your web browser, and copy the URL for the latest release file. It should look something like this:
Then, download it to the target system:
curl -OL URL_HERE
Note that we do it this way, because otherwise we'd have to run the
autogen.sh script which requires yet more dependencies that you're unlikely to have installed.
Then extract it and delete the
tar -xzf libffi-3.3.tar.gz
Now, we can configure and compile it:
make -j "$(nproc)"
Before we install it, we need to create a quick alias:
ln -s lib lib64;
libffi for some reason likes to install to the
lib64 directory, rather than our pre-existing
lib directory, so creating an alias makes it so that it installs to the right place.
Updating the environment
Now that we've dealt with the dependencies, we now need to update our environment so that the compiler knows where to find them. Do that like so:
export LDFLAGS="-L$HOME/software/lib -L$HOME/software/include $LDFLAGS";
export CPPFLAGS="-I$HOME/software/include -I$HOME/software/repos/openssl/include -I$HOME/software/repos/openssl/include/openssl $CPPFLAGS"
It is also advisable to update your
~/.bashrc with these settings, as you may need to come back and recompile a different version of Python in the future.
Personally, I have a file at
~/software/setup.sh which I run with
source $HOME/software/setuop.sh in my
~/.bashrc file to keep things neat and tidy.
Now that we have openssl and libffi compiled, we can turn our attention to Python. First, clone the cpython git repo:
git clone https://github.com/python/cpython.git
Then, checkout the latest tag. This essentially checks out the latest stable release:
git checkout "$(git tag | grep -ivP '[ab]|rc' | tail -n1)"
Important: If you're intention is to use tensorflow, check the Tensorflow Install page for supported Python versions. It's probable that it doesn't yet support the latest version of Python, so you might need to checkout a different tag here. For some reason, Python is really bad at propagating new versions out to the community quickly.
Before we can start the compilation process, we need to configure it. We're going for performance, so execute the
configure script like so:
./configure --with-lto --enable-optimizations --with-openssl=/absolute/path/to/openssl_repo_dir
/absolute/path/to/openssl_repo with the absolute path to the above
Now, we're ready to compile Python. Do that like so:
make -j "$(nproc)"
This will take a while, but once it's done it should have built Python successfully. For a sanity check, we can also test it like so:
make -j "$(nproc)" test
The Python binary compiled should be called simply
python, and be located in the root of the git repository. Now that we've compiled it, we need to make a few tweaks to ensure that our shell uses our newly compiled version by default and not the older version from the host system. Personally, I keep my
~/bin folder under version control, so I install host-specific to
~/software, and put
~/software/bin in my
PATH like so:
With this in mind, we need to create some symbolic links in
~/software/bin that point to our new Python installation:
ln -s relative/path/to/python_binary python
ln -s relative/path/to/python_binary python3
ln -s relative/path/to/python_binary python3.9
relative/path/to/python_binary with the relative path tot he Python binary we compiled above.
To finish up the Python installation, we need to get
pip up and running, the Python package manager. We can do this using the inbuilt
ensurepip module, which can bootstrap a
pip installation for us:
python -m ensurepip --user
This bootstraps pip into our local user directory. This is probably what you want, since if you try and install directly the shebang incorrectly points to the system's version of Python, which doesn't exist.
Then, update your
~/.bash_aliases and add the following:
alias pip='python -m pip'
alias pip3='python -m pip'
/absolute/path/to/openssl_repo_dir with the path to the openssl git repo we cloned earlier.
The next stage is to use
virtualenv to locally install our Python packages that we want to use for our project. This is good practice, because it keeps our dependencies locally installed to a single project, so they don't clash with different versions in other projects.
Before we can use
virtualenv though, we have to install it:
pip install virtualenv
Unfortunately, Python / pip is not very clever at detecting the actual Python installation location, so in order to actually use
virtualenv, we have to use a wrapper script - because the [shebang]() in the main
~/.local/bin/virtualenv entrypoint does not use
/usr/bin/env to auto-detect the
python binary location. Save the following to
~/software/bin (or any other location that's in your
PATH ahead of
exec python ~/.local/bin/virtualenv "$@"
# Write the script to disk
# chmod it to make it executable
chmod +x ~/software/bin/virtualenv
Installing Keras and tensorflow-gpu
With all that out of the way, we can finally use virtualenv to install Keras and tensorflow-gpu. Let's create a new directory and create a virtual environment to install our packages in:
Now, we can install Tensorflow & Keras:
pip install tensorflow-gpu
It's worth noting here that Keras is a dependency of Tensorflow.
Tensorflow has a number of alternate package names you might want to install instead depending on your situation:
tensorflow: Stable tensorflow without GPU support - i.e. it runs on the CPU instead.
tf-nightly-gpu: Nightly tensorflow for the GPU. Useful if your version of Python is newer than the version of Python supported by Tensorflow
Once you're done in the virtual environment, exit it like this:
Phew, that was a huge amount of work! Hopefully this sheds some light on the maddenly complicated process of compiling Python from source. If you run into issues, you're welcome to comment below and I'll try to help you out - but you might be better off asking the Python community instead, as they've likely got more experience with Python than I have.
Sources and further reading