Kafka KRaft on macOS without Docker
Running docker on apple silicon is still somehow a struggle, here are some notes about how to set native kafka with KRaft on macos docker-free and suitable for development.
Data professional
Running docker on apple silicon is still somehow a struggle, here are some notes about how to set native kafka with KRaft on macos docker-free and suitable for development.
If you are in language model space, you cannot avoid 🤗 transformers and other libraries from that ecosystem. From my experience, those libraries are top-quality, they provide invaluable service to the community, and the whole 🤗 story is a great example of how open-source has to be. Also, if you are into reading source code and want to get better with complex python codebases, transformers github repo is the place to learn best practices.
Yet, 🤗 transformers is not the only library in the space, and some others might suit better for more special cases. Prior to 🤗 transformers, you could stick to t2t which seems to be succeeded by trax these days, fairseq was also used to produce many papers and models. Several of the modern high-performing LMs, such as flan-t5-family and ul2 were pretrained with t5x.
Some wrap up of experience of past two days. I tried to automate deployment of kubeflow on to a self-built kubernetes cluster with ansible. 5/7 experience with partial success so far.
I generally like the idea of kubeflow, which on contrary to Sagemaker lets me work and test small things out while not being constantly connected, which feels great while on a train or a plane. I wanted to play more extensively with it so I could not only share my data-science related experience, but also understand if deploying it is easy enough to be handled by a product team and won’t burden some dedicated in-company platform people.
Ansible got much better since last time I went through its docs to get a general feeling on how things are working. The modules became more idiomatic, there’s much less need to plug raw commands. Some interfaces became nicer (writing a list of packages to be installed with apt makes more sense than doing it through with_items). Asserts seemed a great idea, but after some time I found them more distracting than helpful. Inventory plugins are really cool. No more need for a separate magical scripts to get the list of host. Yet, if you’re not a AWS or GCP user, you might get unpleasantly surprised with the quality of those, scaleway plugin works, but feels unpolished and needs experimentation to make things go as expected. In the end I managed to create a flow of playbooks creating me a set of machines and then provisioning those, which looks like one step before a pretty nice scaling automation.
Setting kubernetes to the point of it being able to run some pods was easy, kubeadm does all the work for you. I felt, it was the only easy part about kubernetes :)
I understand, that most of it is about me not reading enough about some specific technology/library/component.