Superfast AI 4/17/23
Meta's SAM, Databricks' Dolly 2.0, BloombergGPT and the future of ad-tech... superfast!
Hi everyone, today we’re covering Meta’s Segment Anything model, Databricks’ Dolly 2.0, BloombergGPT and questions about the future of ad-tech. Lots of fun stuff — let’s dive in!
🗞️ News
Meta’s Segment Anything Model (SAM)
Meta released its foundation image segmentation model called Segment Anything.
The ability to accurately identify and segment objects is important in computer vision applications from self-driving cars, to manufacturing, to robot assistants. Along with the model release, Meta dropped a 1.1 billion mask segmentation dataset, which is the largest of its kind with more than 6x images and 400x masks in the second largest dataset, OpenImages V5. Moreover, all of these images have been verified by a human annotator, which implies impressive scale on the data collection side. You can check out more about their data collection process in their blog post here.
Try out an interactive demo here or see a quick example below:
Databricks’ Dolly 2.0
Two weeks ago, Databricks trained a 6B parameter open-source model from EleutherAI in 30 minutes on one GPU to produce instruction following prompts (i.e. instruction following prompts like “Can you tell me how the capital of France” vs. auto-completion prompts like “The capital of France is…”). And it did all of that under a budget of… $30. 🤯 This was inspired by the work of Stanford’s Alpaca model, which I covered here last time. You can read more about Dolly 1.0 here.
So what’s the catch?
Users reached out to Databricks to test out Dolly 1.0 on some commercial use cases. Dolly, however, was trained on a labeled dataset produced by ChatGPT, and any commercial use cases would go against OpenAI’s terms of use policy. OpenAI disallows the use of their outputs to create competitive products, such as a commercial instruction-following model. So as much as the model was an interesting research project, it remains just that — unusable in a commercial setting.
So, this week, Databricks released Dolly 2.0 which is trained on a dataset of 15,000 Q&As produced by Databricks’ own employees. This time, Databricks chose a 12B parameter EleutherAI pythia model for its underlying model. The resulting model is an open-source, instruction-following model, fine-tuned on human-feedback and licensed for both research and commercial needs. You can read more about Dolly 2.0 here and learn more about how they produced their in-house dataset.
So what’s the use case?
Many customers have expressed hesitation about their proprietary data being used to train foundation models further. As a result, many customers are looking for ways to completely own their AI stack, from data, to foundation model, to fine-tuning. Open-source models offer the opportunity to do just that and build a privately-held vertical stack.
One question that remains to be seen: will open-source models perform as well as the heavily invested in/trained foundation models from companies like OpenAI and Anthropic? If yes, then companies might prefer to fully own their whole AI stack. If not, companies might converge on high-performance foundation models instead. A few parallel questions:
How do small models compare in performance to large models?
For niche use-cases, will smaller/focused models win over larger, generally intelligent models (like PubMedGPT for medical uses — more on that below)?
BloombergGPT
Bloomberg recently announced their finance-specific language model, BloombergGPT. The short summary is BloombergGPT outperformed other key foundation models on practical financial application tasks (see table 1 below). It also performed similarly well on general purpose tasks from the benchmark datasets like MMLU (see table 2 below).
Bloomberg’s team trained a 50B parameter BLOOM model on 700B tokens. To give you a sense of scale, other SOTA models are much larger like 540B PaLM or 280B GPT-3. GPT-4 is said to have 1 trillion parameters. Only around half of the training dataset was finance-specific, with the remaining being general purpose. Here’s a quick snapshot of the dataset:
BloombergGPT Dataset
Financial Datasets (363B tokens – 54.2% of training)
Web (298B tokens – 42.01% of training)
News (38B tokens – 5.31% of training)
Filings (14B tokens – 2.04% of training)
Press (9B tokens – 1.21% of training)
Bloomberg (5B tokens – 0.70% of training)
Public Datasets (345B tokens – 48.73% of training)
The Pile (184B tokens – 25.9% of training)
C4 (138B tokens – 19.48% of training)
Wikipedia (24B tokens – 3.35% of training)
You can read their announcement here or find their Arxiv paper here.
The upshot
What will incumbent companies with huge, proprietary data lakes prefer — an in-house model that they own end-to-end or an API call to a larger, generally capable foundation model?
How much does either method cost in the long-run and how will that affect decision making and investment from executives?
A 50B parameter model is smaller than the SOTA foundation model (like PaLM at 540B) but larger than other domain-specific small models (like PubMedGPT at 2.7B). Smaller models have the advantage of lower training and compute costs, as well as lower inference costs down the line (inference being the cost to query the model and generate useful output). But smaller models typically come with less generalizable knowledge, meaning worse performance on general tasks. That’s fine if you’re only querying a medical LM with medical questions. It doesn’t work if you ask the medical LM to help you with marketing copy. Bloomberg explicitly acknowledges their “mixed approach” in their paper, aiming for both a generally capable model that performs well on niche, finance-specific tasks as well.
At Bloomberg… we set out to build a model that achieves best-in-class results on financial benchmarks, while also maintaining competitive performance on general-purpose LLM benchmarks.
One thing that I’d like to see is the performance of OpenAI’s GPT-3/GPT-4 or Anthropic’s Claude Next on Bloomberg’s Financial Tasks. Their performance on financial benchmark tests is noticeably missing from Bloomberg’s evaluation table above. And as noted last time, OpenAI’s GPT-4 is crushing benchmark tests across many domains. Hopefully we’ll learn more soon!
Product drops
ChatGPT plugins: in March, OpenAI announced plugins capabilities in ChatGPT, including Expedia, Wolfram Alpha and OpenTable in their initial rollout. With the initial rollout, users can leverage ChatGPT plugins to plan a trip, books a dinner reservation or query Wolfram Alpha to supplement any math questions. Interestingly, none of these partnerships are paid. Instead, the vendors participate in a data exchange — OpenAI benefits from the additional functionality made possible via plugins, and those vendors get information on the types of queries people are using ChatGPT for. Check out the announcement here.
Adobe Firefly: Adobe released a suite of creator tools to leverage generative AI within their editing tool. You can see a few of the advances here including out-painting (0:10-0:13), generation (0:16-0:23), and (imo, most impressively!) image combination (1:12-1:18). Check out their announcement here.
📚 Concepts & Learning
ChatGPT vs Search Engine
How does ChatGPT compete with regular search engines? And how does the adoption of LLMs in search affect profit margins? First, it’s worth considering how search engines currently make money and analyze how LLM chatbots could disrupt that mechanism.
At a high-level, Google makes money by selling ads. One method is through Google keywords, where prospective buyers can purchase ads or sponsored links on the first page of search results for particular search queries. One example of this: I just Googled “food delivery” and the first three links returned are ads (HelloFresh, Just Eat and Deliveroo) with an additional 3 sponsored links at the bottom of the first page. When I search “takeout near me”, I’m served two sponsored links at the top, and several links to lists of the top takeout locations near me. It’s also likely that those lists created by popular sites like Time Out and Yelp are also sponsored advertisements: those featured restaurant pay blogs and news publishers to write positive reviews about their business as a top-of-funnel mechanism. Ads, ads, ads all the way down… at least on the front page of Google results. Furthermore, I have to evaluate which individual news publisher I find trustworthy: Eater or Time Out? Yelp or UberEats?
In contrast, when I query ChatGPT for “takeout near me”, it offers me a short list of 6 options. 3 of those options are food delivery vendors, but surprisingly the other 3 are popular takeout restaurants, which are pretty good recommendations. The resulting experience is: ChatGPT cuts through the noise of sponsored links, speeds up my search time (it delivers me 6 results immediately instead of requiring me to click into a search result), and massively decreases my cognitive load in making my ultimate takeout decision. I don’t have to evaluate search result links or the subsequent restaurant options on the list. Instead, I can ask ChatGPT a follow up question about which takeout location is the best for my needs.
The question that comes out of this is: if more users inevitably move towards search chatbots over traditional search engines, how do Google and Microsoft reconcile the conflicts search chatbots pose to their current business model?
This conversation is bolstered by reports of ChatGPT success as a new customer acquisition channel:
ChatGPT the lead generator
It’s now possible to use ChatGPT and other AI-first search engines as lead generators.
I recently put this to the test while planning an upcoming trip to Switzerland. Instead of going the typical route and booking through Airbnb or VRBO, I wanted to explore some lesser-known short-term rental options.
Last time I visited Switzerland, I stumbled upon a hidden gem of an apartment on a small, niche rental site that wasn't listed on any of the larger platforms. So, I turned to ChatGPT for help this time round as well.
ChatGPT returned the typical and expected results - think Airbnb and VRBO. I think mistakenly it also returned results to companies that were no longer in operation, which I can only assume were due to the outdated training dataset.
But surprisingly, ChatGPT also successfully returned a niche short-term rental site that I hadn’t heard of before with some compelling options. Between the larger players (Airbnb/VRBO) and the niche rental site, I spent a few hours comparing the options, an in the end, I surprised myself by deciding to go with the niche site.
The question I had was: how did ChatGPT learn about this niche site and know to recommend it and why was this site nowhere near the first page of short term rental results when I searched on Google?
At present, traditional search engines still provide ground-truth since LLMs hallucinate and return outdated information. But the vision here is that plugins will solve some portion of LLMs issues (like a Maps plugin) or that LLMs will simply get much better or more knowledgable so that we no longer rely on traditional search engines to verify LLMs’ claims. It seems to me the switch to search chatbots is limited by their trustworthiness and their superiority over alternatives.
What is the future of ad-driven search?
So this brings me back to the questions of what the future of ad-tech, search chatbots, and traditional search engines looks like. What happens when search chatbots drive attention and traffic away from traditional search engines because they deliver a better search product for those users? How do search chatbots learn what a good, unbiased recommendation is? Is it just through updated datasets? What datasets do LLMs learn from to acquire information about the “best-in-class” product? Isn’t it from traditional search engines? Or from biased news articles published by the companies who sponsor ads?
One bigger question I have to mull over: What about company sponsored datasets? Will we see more of those in the future? Will datasets be the ultimate lever to encourage LLMs to recommend our products? How do we disentangle the business ethics there?
If you have any thoughts on any of this, reply to this email and drop me a line!
🎁 Miscellaneous
How does data quality affect generative output? Here are the outputs of two models: the first trained on 1B average quality samples and the second trained on 1M high-quality samples. The images speak for themselves!
Check out this longer piece on how to use feedback to further train your generative models. How did we get to the output of animated female faces from the prompt “lol”:
Share Superfast AI → Get the List of Top 2023 AI Newsletters
I haven’t seen an actual good and updated list on AI newsletters to follow. Even the “April 2023 updated” lists online are outdated and stale. Not a single one mentioned Ben’s Bites, which is the go-to in startup community (with 80k+ subscribers, it should definitely be included as a notable option). Another one that is surprisingly missing is The Neuron with 40k+ subscribers, written by Silicon Valley and startup insider Pete Huang.
If you’d like to get my official guide to the best newsletters out there for AI enthusiasts, share this post with 1 AI-curious friend and I’ll email you the link! Let me know if you have a favorite I’m missing too. Click below:
What did you think about this week’s newsletter? Send me a DM on Twitter @barralexandra
That’s it! I’ll be sending out newsletters more regularly from now on, so keep an eye out! Have a great day and see you next week! 👋