Running t5x on macos

Background

If you are in language model space, you cannot avoid 🤗 transformers and other libraries from that ecosystem. From my experience, those libraries are top-quality, they provide invaluable service to the community, and the whole 🤗 story is a great example of how open-source has to be. Also, if you are into reading source code and want to get better with complex python codebases, transformers github repo is the place to learn best practices.

Yet, 🤗 transformers is not the only library in the space, and some others might suit better for more special cases. Prior to 🤗 transformers, you could stick to t2t which seems to be succeeded by trax these days, fairseq was also used to produce many papers and models. Several of the modern high-performing LMs, such as flan-t5-family and ul2 were pretrained with t5x.

T5x is an all-in-one toolset to pretrain, fine-tune, evaluate and run inference for LMs of various architectures. As it is built on top of Jax for dealing with neural networks and tensors, GPUs and TPUs are supported, with the latter being utilized very efficiently. It also uses seqio for handling and processing datasets. The depth of dependency tree is pretty large though, it relies on tensorflow, gin-config, gcs-related libraries, sentencepiece tokenizer and many many others. Number of dependencies is positively correlated with the chance of the library not working smoothly on some of the platforms, which happened in this case. Tensorflow-text package, which is used to wrap and process textual data as tensorflow tensors, has known issues to install on macos with apple silicon. Luckily, due to contributions from the open source community, and especially sun1638650145 we can use tensorflow-text, and subsequently T5x with macs. The process is not as smooth as pip install, yet doable, and lets one prototype, debug and even train small models without having continuous access to an external machine.

In this post I provide a step-by-step guide, which is “works-on-my-machine” certified. Also, given how quickly things move with library versions and the ecosystem, the guide might need some adjustments for package versions, and maybe even dependencies in the future. The way to run T5x might come magical or too complicated, there’s a good reason for it, I might cover it in some later posts.

I use conda python, but any python 3.10 installation should do (pyenv or brew, I guess). Also this guide currently confirmed to work for cpu version of Jax. I do not see any reason for it not to work for apple silicon GPU, but I have some issues on my machine, and CPU version works fine for some light debugging and prototyping anyway. Tensorflow for macos will give us extra issues and warnings during the installation, but as we do not use tensorflow itself for training here, it is fine. If you plan to use tensorflow later, I pity you check here, but I cannot say for sure if everything would work inside of this environment later.

Points to consider

  • The whole exercise takes about 9GB on disk, together with venv, repositores and example dataset.
  • I tend to use python -m pip ... everywhere, due to pip softlink being frequently broken in the environments I work in. If you are confident, that’s not the case and you’re not up for typing extra, feel free to just call pip directly.

Step-by-step guide

init and activate your python environment

For the sake of exercise, I named my T5x space root directory t5x_tftext_free .

  • mkdir t5x_tftext_free && cd t5x_tftext_free
  • python -m venv ./.venv
  • . ./.venv/bin/activate
  • python -m pip install --upgrade pip

Put libraries into your environment

Setting up tensorflow-text

We start with the package causing issues with T5x on macos.

  • get tensorflow-text wheel from the prebuilt releases
  • Install it into your environment from the place where your package got downloaded into python -m pip install -I ~/Downloads/tensorflow_text-2.12.1-cp310-cp310-macosx_11_0_arm64.whl

Setting up seqio

  • git clone https://github.com/google/seqio.git
  • Edit setup.py in the cloned repo, find tensorflow-text inside of install_requires list and comment it out.
  • install the library with python -m pip install -e ./seqio (still, assuming you are inside of the space root directory).

Setting up t5x

  • git clone https://github.com/google-research/t5x.git
  • Make two changes in ./t5x/setup.py:
    • Find seqio dependency and comment it out
    • change tensorflow-cpu to tensorflow
  • Install it with python -m pip install -e ./t5x/

Setting up t5

The repository for the original T5 paper was based on tensorflow-mesh. Despite being deprecated, it is still partially used for T5x, as some of the text preprocessing tasks didn’t change. We’ll have to install it as a dependency.

  • git clone https://github.com/google-research/text-to-text-transfer-transformer.git
  • edit text-to-text-transfer-transformer/setup.py and comment out seqio-nightly
  • python -m pip install -e ./text-to-text-transfer-transformer

Fixing tensorflow version

At this stage you’ll get tensorflow 2.13.X installed, which is incompatible with tensorflow-text that you got from the custom built wheel in the beginning. You have to force your version back to 2.12.0 by python -m pip install tensorflow-macos==2.12.0.

After you do that, check the version by running

import tensorflow as tf
print(tf.__version__)

in the python shell.

Extra libraries that might come handy

  • you might also want to install ipython for checking things on the go. Feel free to go with jupyter(lab) though. I prefer ipython as it forces me to put anything I want to save into a script file, rather than have a huge pile of unmaintained and disconnected notebooks. So python -m pip install ipython
  • tqdm comes together with other dependencies, but you might still want to ensure that you have it as things might change in the future (python -m pip install tqdm)
  • You’ll want to track the training process later, set up tensorboard. python -m pip install tensorboard.

Get things running

Get a dataset for training

Get yourself a dataset for testing things out. We’ll use a popular machine translation dataset to simplify things, as there’s already a t5x model configuration prepared for it.

  • mkdir tfds_data
  • export TFDS_DATA_DIR=$(pwd)/tfds_data
    • Keep in mind, that many of the t5x and t5x-related mechanisms (jax included) have hard time dealing with relative paths, so put an absolute path here and for anything below too, and also do not use relative references in absolute paths as well (as in /home/user/workspace/something/../something_else/ will cause problems)
  • tfds build wmt_t2t_translate --data_dir=$TFDS_DATA_DIR --max_examples_per_split 1000000 (limit here is both for speed and due to an error of too many open files if I go without it)
    • this one will take a while. You’ll see some warning and errors about the lack of google cloud credentials (if you don’t have those configured), that’s fine.

Set the model folder and configuration

Prepare a folder for your model, feel free to use a different folder name, yet this model is for testing only, so there’s no reason to keep it for later.

  • mkdir model_artifacts
  • export MODEL_DIR=$(pwd)/model_artifacts

Now to the model configuration

  • edit ./t5x/t5x/examples/t5/t5_1_1/examples/base_wmt_from_scratch.gin, find include "t5x/examples/t5/t5_1_1/base.gin" and use tiny.gin instead of base.gin to speed up the training run.

Run the training

  • export T5X_DIR=$(pwd)/t5x
  • python ${T5X_DIR}/t5x/train.py --gin_file="t5x/examples/t5/t5_1_1/examples/base_wmt_from_scratch.gin" --gin.MODEL_DIR=\"${MODEL_DIR}\" --tfds_data_dir=${TFDS_DATA_DIR}
  • Wait for a bit to see the training steps succeeding

You should see output similar to this.

I0709 15:29:52.025439 8383831552 train.py:613] BEGIN Train loop.
I0709 15:29:52.025554 12803141632 logging_writer.py:48] [0] collection=train timing/train_iter_warmup=1.90735e-06
I0709 15:29:52.025690 8383831552 train.py:618] Training for 500 steps.
I0709 15:29:52.026439 8383831552 trainer.py:509] Training: step 0
I0709 15:30:02.658104 8383831552 trainer.py:509] Training: step 1
I0709 15:30:22.038341 8383831552 trainer.py:509] Training: step 3

If you see it, feel free to ctrl+c out of it, and delete both, the model and the tfds directory contents. Those were used only to confirm that the library works.

Enjoy building your next great model!

Written on July 9, 2023