Introducing the Groundbreaking Arrival of the ‘Fairly Trained’ AI Large Language Model

OpenAI has long argued that it would be impossible to train advanced AI models without using copyrighted materials. However, a new model called KL3M (Kelvin Legal Large Language Model) is challenging this assumption. Developed by startup 273 Ventures, KL3M has become the first Large Language Model (LLM) to receive a “Licensed Model (L) Certification” from independent auditing company Fairly Trained. This certification signifies that KL3M’s training data was obtained and used legally, either through a contractual agreement or in the public domain.

KL3M was trained on data collected by 273 Ventures from sources such as U.S. government document releases and old legal filings that are in the public domain. The startup carefully curated this data to ensure it did not infringe on copyright laws. The resulting product, the Kelvin Legal DataPack, contains over 150 billion tokens and was released in August 2023. KL3M was then trained on a high-quality, curated English subset of this dataset.

The results of KL3M’s training are two versions of the model: kl3m-170m with 170 million parameters and kl3m-1.7b with 1.7 billion parameters. The smaller version can be run on low-powered hardware like a Macbook Air with an M1 chip, while the larger version requires more powerful hardware. 273 Ventures is also preparing to release a 3.7-billion parameter variant of KL3M next month.

KL3M is primarily designed for use in law firms and the legal industry. It can assist with drafting and revising time entries and invoices, contract clauses, SEC filings, and patent drafting. However, the model has proven to have a broader reach than anticipated, as the law touches on various topics in society and governments produce source material that teaches concepts and language use.

In terms of performance, KL3M has shown promising results. The 1.7-billion parameter model has lower token predicting errors, or perplexity, compared to other leading models in writing legal material and Wiki entries. It also performs better in toxicity measurements, producing fewer toxic outputs than other models in its class.

While the cost of KL3M is not publicly available, it is already being used by several law firms. Interested parties can reach out to 273 Ventures for more information.

The arrival of KL3M and its Fairly Trained certification marks a significant development in the AI industry. It challenges the notion that copyrighted data is necessary for training advanced AI models and opens up new possibilities for creating AI models that respect data integrity and legal boundaries. As the fight over data scraping practices and copyright infringement continues, KL3M offers an alternative approach that prioritizes ethical and lawful AI development.