Unless you’ve been living under a rock, you likely know that Llama 2, the foundational LLM from Meta, has been released under an open-source license. You can use Llama 2 for free on the HuggingChat UI, the HuggingFace equivalent of ChatGPT, or through the HuggingFace model directly.
The focus of today’s newsletter won’t be on Llama 2 directly, although here are some key highlights:
Llama 2 was trained on 40% more data than the original Llama, and has double the context length 4096 tokens.
Llama 2 comes in three sizes: 7b, 13b, and 70b.
The fine-tuned Llama 2 model is termed “Llama-2-chat” which uses publicly available RLHF datasets including over 1 million human annotations.
Llama 2 outperforms other open source LLMs on 11 external benchmarks
Now, let's shift our focus to these 11 benchmarks mentioned earlier. These benchmarks are widely used to assess the performance of LLMs. You might be familiar with some of them, but what exactly do they measure? What do the tasks in these benchmarks look like? Let's explore a couple of the notable ones together.
MMLU (Massive Multitask Language Understanding)
Measures knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings. The benchmark covers 57 subjects across STEM, the humanities, the social sciences, law and ethics, mathematics, and history. Think of this like a multiple choice SAT test.
Question:
Husband and Wife, walking on a country road, were frightened by a bull running loose on the road. They climbed over a fence to get onto the adjacent property, owned by Grower. After climbing over the fence, Husband and Wife damaged some of Grower's plants which were near the fence. The fence was posted with a large sign, "No Trespassing." Grower saw Husband and Wife and came toward them with his large watchdog on a long leash. The dog rushed at Wife. Grower had intended only to frighten Husband and Wife, but the leash broke, and before Grower could restrain the dog, the dog bit Wife."If Wife asserts a claim based on battery against Grower, will Wife prevail?
Choices:
Yes, because Grower intended that the dog frighten Wife.
Yes, because the breaking of the leash establishes liability under res ipsa loquitur.
No, because Wife made an unauthorized entry on Grower's land.
No, because Grower did not intend to cause any harmful contact with Wife
Answer: A
Question:
Like schools in China, American schools begin in September after a long summer vacation. There are two terms in a school year. The first term is from September to January and the second is from February to June. Usually American children begin to go to school when they are five years old. Most students are seventeen or eighteen years old when they finish high school. But unlike middle school students in China, high school students in America take only four or five subjects each term. They usually go to the same classes every day and have homework for every class. After class they do all kinds of interesting things. After high school, many students go to college. They may go to a small or a large one. They usually have to pay a lot for their higher education. So lots of students work after school to make money for their studies. ,: Many American students work after class, because they _ .
Choices:
want to see interesting things
have to help the other people
want to make more friends
have to get money for their studies
Answer: D
Question:
Reading makes a full man" (Bacon, 1597). Novels written by the writers like Jane Austen, Victor Hugo and Ernest Hemingway help us to know more about our history, culture and many other things. Jane Austen(.) was one of the most well-known women writers of the world. She was born in England in 1775. Jane loved reading and writing. She wrote a number of famous novels in her life. Among them, Pride and Prejudice<<>> written in 1779 was the most popular. Victor Hugo(.), born in 1802 in France, was one of the best writers in the19th century. The talent in writing and hard work brought great success to Hugo at an early age. His most popular novel, theHunchback of Notre-Dame<>, was written in 1831. The book was so successful that it was quickly translated into many other languages across Europe. Ernest Hemingway(.), an outstanding American writer and reporter, was born in 1899. His life experience had a great influence on his writing style. Hemingway lived in France and Italy between the 1920s and 1950s. Most of his books such as The Sun Also Rises were written at that time. He won the Nobel Prize in 1954 mainly because of the novel The Old Man and the Sea. When was Jane Austen born?
Choices:
In 1775
In 1779
In 1597
In 1899
Answer: A
TriviaQA
TriviaqQA is a reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaqQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions.
Question:
Who wrote the novel Evening Class?
Answer:
Maeve Binchy
Question:
What breed of dog did Columbo own?
Answer:
Basset Hound
Question:
Neil Armstrong was a pilot in which war?
Answer:
Korean
Question:
In which country is the Aswan Dam?
Answer:
Egypt
Takeaway
I was surprised by how challenging the questions in these tasks are, which makes it all the more impressive how well models, such as Llama 2, perform. It's important to note that the examples from the two benchmarks weren't cherry-picked at all, which means they provide a good representation of the difficulty of the benchmark.