09.06.23
Many in the mainstream media have called the release of OpenAI's public research preview of ChatGPT the "iPhone moment" of AI.
With its release, text generation AI models (or rather their public perception) went from bad chatbots bothering users without
asking to having people rethink billion-dollar industries in a matter of weeks.
Much like the iPhone, many agree that the emergence of this technology will change the way we interact with applications and information.
To one particular group, this analogy might be a painful remembrance: the open-source community.
When Steve Jobs presented the first iPhone to the public in 2007, and during its later successes, many
in the open-source community were dissatisfied with this trend. The community, much like their name suggests,
didn't like the closeness of the iPhone.
Fast forward to today, and with the realization that almost nobody is running Linux on their phones these days, it's
safe to say that the open-source community lost the "mobile war" if it ever was fighting it. So how is the
open-source community faring in the “AI war”?
As of writing this blog post, HuggingFace, the leading platform for collaborating and sharing open-source machine learning models,
hosts over 200,000 openly accessible models. Only counting Apache-2.0 and MIT licenses, you're left with over 40,000 open-source
models that anyone can use to create their model or use directly in a commercial application. Of course, many of those models are
iterations, fine-tuned or quantized versions of a smaller subgroup of foundational models. This is still an impressive number however,
and the sheer pace at which the open-source community continues to innovate is incredible.
Specifically, NLP models are very popular at the moment. Just a couple of days ago the Falcon Models were released under a
permissive license by the Technology Innovation Institute (TII). The models are trained from scratch using less than half the
compute of MetaAI’s LLaMa models and can outperform even Metas Flagship 65B model.
This shows that really, the agility and pace at which the community operates at is one of the driving factors of these successes.
Keeping up with the best and latest models seems almost impossible. Today you can find open-source solutions for almost any problem a
machine learning model can solve. The continuation of this trend is of tremendous value for the public and especially for businesses
since open-source resources have historically played an important role in powering critical infrastructure.
Open-Source solutions allow for applications which exists solely on a company’s own managed infrastructure while data protection
policies currently limit firms in their usage of third-party Machine Learning models - even though their potential utility for a business is
immense. This is exactly where open-source alternatives find their value - handling sensitive data in a contained environment, without
having to consider the risks of a third-party provider obtaining sensitive information.
At SKAD we’re actively monitoring the advances of the open-source community to develop and deploy contained state-of-the art
machine learning solutions for businesses. Some open-source NLP models are already so advanced that they can provide solutions
many for many common language processing problems out-the-box – a phenomenon that seemingly arises naturally with sufficiently big models.
However, since open-source models are easily fine-tuned we can also provide your business with custom tuned versions of
open-source models which tackle a specific problem you might have.
If you have any questions or inquiries about deploying open-source machine-learning models in your organization,
feel free to reach out to our intelligent automation team!