Introduction
It's 2024 and AI and LLMs have swept across the collective consciousness of everyone, so much so that it can be difficult to find a tech product that doesn't have AI in its name somewhere.
Whenever a software developer asks "I'm interested in working with AI, what are the best technologies that I should use?" the answer would more often than not be pytorch, numpy, tensorflow, langchain, and various other libraries related to Python. While not necessarily incorrect—after all, the adjective "best" can mean several things to different people—it oversimplifies the field of AI and unreasonably limits the available software stack choices.
I feel that one of the reasons why other technology stacks (e.g. Ruby or PHP) aren't popular as AI development tools compared with the programming language Python is the relative lack of technical foundations that developers have. Many developers who have made their career over the past few years as web application developers are now anxious about the challenges (and opportunities) that AI brings to table. They start asking themselves
- "Can Ruby be used for AI?"
- "How do I integrate an LLM chatbot into my PHP application?"
- "I'm a Ruby developer, do I need to be an expert at Python to use generative AI?"
- "I heard langchain is a necessary library for building apps with generative AI, do I need to use it?"
This series of articles attempts to address those questions, starting with answering the question: how do you encode and search for meaning? Later we'll have an example application that shows how you can build llm-powered applications in Ruby.
What is Semantic Search or Vector Search?
The traditional approach to locating and identifying information is through keyword search. You're probably already very familiar with this type of search: whenever you enter a search query with "quotation marks" in Google for example, you get webpage results with content that match exactly the keywords in quotes.
Keyword search is such an important functionality that many databases offer built-in, capable, and fast indexing of documents, also known as full text search. For example, postgresql has built-in full text search while sqlite has a first-party extension for that functionality.
Keyword search may be powerful, but it can also have some limitations. For example, a user interested in searching for articles related to politics may enter the word "politics" in the search field but then may miss out on search results for articles that talk about "voters" or "the electoral college." This may be because the word "politics" itself was not mentioned at all in those articles, even if the articles themselves were about politics.
Another example can be a user trying to find a "phone case" in an e-commerce shop. By only providing keyword search on the item listings, the user misses out on items that are described as "phone sleeve," "phone protector," "bumper case," etc.
In other words, users will sometimes have a search intent that's different from their search keywords, and providing simple keyword search functionality is insufficient to accommodate this intent.
One way to bridge this gap between intent and actual search queries would be incorporating semantic search. Semantic search allows a server to return search results that not only contain the specific keywords in a query, but also results for related concepts as well as variations on the keyword. Not only that, but since we're talking about search intent here, semantic search may also return results ranked according to its relevance (i.e. how close or far the results are to what the user was intending to look for).
Here is a table of the differences between semantic search and keyword search:
Keyword Search | Semantic Search |
---|---|
Find documents that contain specific keywords or phrases. | Find documents that are relevant to a user's query, taking into account context and meaning. |
Uses exact matching of keywords in the document content. | Uses natural language processing (NLP) techniques to understand the intent and meaning behind the query. |
Simple string matching without any consideration for context or synonyms. | Complex NLP processing, including tokenization, stemming, lemmatization, and semantic analysis. |
Documents that contain all keywords are ranked higher. | Documents that are most relevant to the query are ranked higher, based on factors such as relevance, authority, and user feedback. |
Query is interpreted literally, without consideration for context or nuances. | Query is interpreted in a more nuanced way, taking into account context, intent, and user preferences. |
Results are often limited to exact keyword matches, leading to low diversity. | Results are more diverse, as the search algorithm considers multiple factors beyond just keyword matching. |
That's all well and good, but while text indices (used for keyword searches) are easy to wrap our heads around, semantic indexing is something a bit more esoteric. After all, you are merely searching for exact occurrences of a particular string in a corpus of text. But semantic search isn't as straightforward; the word "bank" for example can have different meaning based on its context (e.g. financial bank or river bank) and those differences in nuance need to be differentiated.
In other words, how do we encode (and eventually search) for meaning?
What are Embeddings?
Educational YouTuber 3Blue1Brown has a great explanation on embeddings:
We won't dive down to the different algorithms used to transform a piece of text into an embedding (Google's transformers paper for example is one of the more interesting proposals that changed the way neural networks encode language) but in short, embeddings is a process of transforming data into a series of numbers that represent various "dimensionalities."
We don't know exactly what dimensionalities are actually being represented (the YouTube short above for example talks about "Italian-ness") but there really isn't a way to pinpoint exactly which string of numbers correspond to a particular dimension; we can take a guess but it's really up to the neural network's training and dataset to determine exactly how such concepts are encoded.
Suffice to say, embeddings are the missing link we're looking for: it is a way to represent data as a point in multidimensional space and with it, its relation to other data points in the same multidimensional space. By having such kinds of relations we are able to deduce "meaning" out of the data points that we have. Data can be identified based on similarity metrics instead of exact matches, making it possible to somehow understand data contextually.
For example, when training a model to understand text, words with similar meanings should have similar embeddings. So, if we train a model on a large corpus of text and then ask it to find the embedding for the word "politics", it will return a vector that is close to the embeddings of other words like "voters" or "electoral college."
What are Vector Databases?
Now that we have embeddings—a representation of "meaning" or "intent" as a sequence of numbers, we are now also able to store such data in persistent storage.
Relational databases have traditionally been the go-to service for storing structured data as it makes it easy to encode and define relationships between entities. This structure makes also it easy to query data based on these relationships.
Embeddings, often stored as vectors, which are arrays of numbers that represent semantic or numerical properties of complex data, aren't structured though. While you can store vectors as blobs or strings, doing so makes it difficult to query the data and retrieve useful information (e.g. how "near" a particular piece of data is to another).
Vector databases (as well as vector extensions for traditional relational databases) are specifically made to store vectors and allows for efficient similarity search and analysis. When a user queries the vector database for similarities, it uses techniques like approximate nearest neighbor search (ANN) to quickly find the most relevant results.
Implementing Semantic Search
We now have most of the fundamental concepts necessary to implement semantic search, as well as a theoretical workflow to both store and retrieve "meaning" into persistent storage:
- First, we transform data (be it either text, images, or audio) into multidimensional embeddings represented vectors.
- We then store these vectors in a vector database.
- When a user has a query, we transform that query into embeddings the same way we originally transformed the initial data into embeddings.
- With the embeddings, we perform a similarity search to other embeddings.
- We retrieve the data associated with that similar embedding, effectively retrieving data that is closest in "meaning" to the original query.
Assuming that we already have the data we want to store, the next step is to get the data into a vector database.
Choosing a Vector Database
There are a number of available vector databases (and extensions that transform ordinary databases into vector-capable ones) and they can roughly be divided into two categories:
- Dedicated databases that store unstructured data though document-based, key-value, column-oriented, or graph-based stores (aka NoSQL).
- Relational databases that support vector search (usually) through extensions.
Weaviate
Weaviate is an open-source, scalable vector database designed to provide efficient and powerful search capabilities for machine learning models. It allows developers to store, manage, and query large datasets of vectors, making it an ideal solution for ai applications that require complex data analysis, language understanding and generation, and pattern recognition.
It falls under the category of "NoSQL" and stores data as JSON objects in a collection. Unlike traditional SQL databases, you don't create queries for data but instead you "search" for them and the database "retrieves" data based on the filters and keywords you have specified.
It bills itself as an AI-first database and does keyword search (traditional text-based search using "token" frequency), vector search (similarity-based search using vector embeddings), and hybrid search (combines vector and keyword search results). It even supports Retrieval Augmented Generation as a native query, integrating itself with various generative model providers like AWS, Cohere, Google, OpenAI, and Ollama.
Postgresql and pg-vector
Postgresql is a widely known relational databases and is able to support multiple storage mechanisms through its extension system. Some popular ones are
- PostGIS which adds support for geographic objects, allowing it to be used as a backend spatial database for GIS (geographic information systems).
- hstore which implements the
hstore
data type for storing sets of key/value pairs within a single database value (similar to how dedicated key-value store databases like Redis would behave). - pgcrypto which provides cryptographic functions for the database and allows encryption for specific columns.
pgvector is an open source extension that provides vector similarity search for Postgres. Since pgvector is an extension for an SQL datases, it is unlike dedicated vector databases in that it is able to use plain SQL to retrieve data. For example, to get the nearest neighbors to a vector you'd do:
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
and getting the nearest neighbors to a row:
SELECT * FROM items WHERE id != 1 ORDER BY embedding <-> (SELECT embedding FROM items WHERE id = 1) LIMIT 5;
SQLite and sqlite-vec
SQLite is a popular, open-source, C library that provides a lightweight disk-based relational database management system with no separate server process or configuration files. SQLite is used as the default database for many mobile phones and tablets, including Apple's iOS and Google's Android operating systems.
SQLite stores data in a single disk file, known as a "database" or "datafile," which may have one or more tables. It uses its own, built-in implementation of the SQL (Structured Query Language) for defining and manipulating database schema, creating and querying tables, and performing various tasks such as managing indexes, transactions, and data integrity constraints.
Just like Postgresql it has support for extensions that enable new functionality such as application-defined SQL functions, Virtual File Systems as well as Virtual Tables.
sqlite-vec is one such extension that brings vector search capabilities to SQLite. It stores data in a virtual table, which means you're able to use plain SQL (albeit with some custom interface) and query data as if they were tables (even if the actual underlying data is not).
Here's an example from the github repository:
.load ./vec0
create virtual table vec_examples using vec0(
sample_embedding float[8]
);
-- vectors can be provided as JSON or in a compact binary format
insert into vec_examples(rowid, sample_embedding)
values
(1, '[-0.200, 0.250, 0.341, -0.211, 0.645, 0.935, -0.316, -0.924]'),
(2, '[0.443, -0.501, 0.355, -0.771, 0.707, -0.708, -0.185, 0.362]'),
(3, '[0.716, -0.927, 0.134, 0.052, -0.669, 0.793, -0.634, -0.162]'),
(4, '[-0.710, 0.330, 0.656, 0.041, -0.990, 0.726, 0.385, -0.958]');
-- KNN style query
select
rowid,
distance
from vec_examples
where sample_embedding match '[0.890, 0.544, 0.825, 0.961, 0.358, 0.0196, 0.521, 0.175]'
order by distance
limit 2;
/*
┌───────┬──────────────────┐
│ rowid │ distance │
├───────┼──────────────────┤
│ 2 │ 2.38687372207642 │
│ 1 │ 2.38978505134583 │
└───────┴──────────────────┘
*/
I'll be using SQLite and sqlite-vec as it makes the application setup a lot easier since:
- everything is stored in a file
- SQLite doesn't need an extra server to setup
- installation is well documented as sqlite libraries are available on many different platform targets
- they provide an official, first-party ruby gem with examples (note that pgvector also has an official, first-party ruby-gem
Choosing an LLM Provider
Nowadays, AI/LLM application developers are spoiled for choice. There are so many cheap (and oftentimes generous to to the point of being free) providers available for someone to start exploring AI/LLM functionality and build llm-powered apps.
We'll look at a few of them for comparison and to find out which one would be the best to use for our example.
OpenAI
You've probably heard of ChatGPT which took AI and LLM out from research papers and into public discourse. It was trained and developed by OpenAI, a leading research and deployment company focused on artificial intelligence, particularly in the field of large language models (LLMs). Founded in 2015 by prominent figures like Elon Musk and Sam Altman, OpenAI's mission is to ensure that artificial general intelligence benefits all of humanity.
OpenAI is renowned for developing groundbreaking LLMs like GPT-3 (Generative Pre-trained Transformer 3) and its successor, GPT-4. These models are capable of understanding and generating human-quality text, enabling a wide range of applications such as writing creative content, translating languages, summarizing information, and answering questions in a comprehensive and informative way. OpenAI also offers access to these powerful models through its API, allowing developers to integrate LLMs into their own applications and products.
We can think of it as a first mover in the AI/LLM space, and enjoys advantages such as being top of mind when it comes to chatbots and large language models. Their prices are commensurate with this though, and while they do push the technology forward to its limits, beginning AI solutions developers may find the expense a bit too prohibitive and may want to try out other cheaper alternatives.
Hot on the heels of OpenAI is Google's Gemini (previously known as Bard). It is Google's ambitious new AI model, designed to be a versatile and powerful tool capable of handling a wide range of tasks beyond the traditional realm of text generation. Unlike previous models focused on specific areas like language or images, Gemini aims to be multimodal, understanding and generating both text and visual information. This means it could potentially power applications like interactive storytelling, realistic image editing, and even creative coding.
Google envisions Gemini as a foundational AI, serving as the backbone for future products and services. Its open-weights nature means that developers can access and modify its code, fostering collaboration and innovation within the AI community. Gemini is still under development, with Google gradually releasing more information about its capabilities and applications.
Early demonstrations showcase Gemini's potential in tasks like summarizing factual topics, translating languages, and generating different creative text formats. As development progresses, we can expect to see Gemini integrated into various Google products and services, potentially revolutionizing how we interact with technology and access information.
Google provides an API for Gemini through its GoogleAI for Developers site. It is somewhat cheaper than OpenAI's pricing (and there's even a free tier available) but there are also less models available, and the models available aren't as well known.
Anthropic
Anthropic is an AI research company that has gained significant recognition for its work on large language models (LLMs). Founded by former members of the OpenAI team, Anthropic's mission is to build reliable, interpretable, and steerable AI systems that benefit humanity. Unlike some competitors focused solely on performance, Anthropic emphasizes ethical considerations and responsible development practices.
Their flagship model, Claude, is a powerful LLM known for its ability to engage in natural conversations, generate creative content, and provide helpful summaries of factual topics. Unlike models like ChatGPT, which are primarily text-based, Claude integrates with other tools and platforms, allowing for more versatile applications. Anthropic's commitment to transparency is evident in their open-weight policy for smaller versions of Claude, enabling researchers and developers to scrutinize and contribute to the model's development.
Anthropic's focus on ethical AI is reflected in their research on alignment, aiming to ensure that LLMs remain aligned with human values and goals. They actively engage with policymakers and the public to foster responsible development and deployment of AI technologies. As a relatively new player in the LLM space, Anthropic is making waves with its commitment to both performance and ethical considerations, positioning itself as a key player in shaping the future of AI.
Anthropic's Claude API pricing seems to be on the more expensive side, even when compared to OpenAI and especially so when compared to GoogleAI. Many people have noted though that Claude performs very well, especially in textual conversation, and it could very well be the reason for the increased prices.
Ollama
If we're looking at cost, then there really isn't anything that can beat hosting your own LLM on your own hardware. The only costs associated would be the initial investment required to get a decent CPU and GPU with enough RAM to be able to run the models. Even then, you don't need to have the latest and greatest machines to run local AI; with the release of smaller, quantized models you can even use the laptop you're doing development on (just don't expect too much in terms of token generation or response accuracy).
Ollama is an open-source, large language model (LLM) developed by the AI21 Labs team. It's designed to be adaptable and user-friendly, with a focus on responsible AI principles. Ollama allows developers to fine-tune it for specific tasks like text generation, question answering, summarization, and more. This makes it a versatile tool for building custom applications powered by AI.
One of the key features of Ollama is its modular architecture. This means different parts of the model can be independently trained and updated, allowing for greater flexibility and customization. Additionally, Ollama supports various deployment options, including on-device inference, making it suitable for a range of use cases, from small personal projects to large-scale enterprise applications.
Open access to Ollama's code and weights promotes transparency and collaboration within the AI community. Researchers and developers can explore its inner workings, contribute to its development, and build upon its existing capabilities. This open approach fosters innovation and accelerates progress in the field of artificial intelligence. You get to access models such as gemma2 from Google or llama from Meta and build on top of their training and research to apply to your own llm-powered applications.
What makes Ollama interesting for beginning AI solutions developers is that they try to have their API compatible with as much of OpenAI's API as possible. This means that you can use many available OpenAI libraries with Ollama as the endpoint, prototype your application flow with Ollama, and later in production (or validation) switch to OpenAI to get better results or to take advantage of more processing power. It allows you to focus on writing business logic without having to worry about whether you're going over budget with all the testing you've been doing.
These two—pricing and OpenAI API compatibility—are the reasons why I'm choosing Ollama as the LLM provider for the example application.
Choosing an Embedding Model or a Language Model
An embedding model is a type of machine learning model used for mapping real-world objects or concepts into a lower-dimensional space (usually a vector) in such a way that semantically similar objects or concepts have a small distance between their vectors in the new space. This is particularly useful in natural language processing, computer vision, and recommendation systems to find relationships and patterns among the data.
We've talked about embeddings and vector search previously, but we hand-waved away how they are actually created. It's still a topic that's much too complicated to explain in this introductory article, but suffice to say that an embedding model determines how relationships between concepts are weighted; different embedding models will produce different weights and are likely trained on specific datasets to either be as broad and generic, or as narrow and specific with regard to a task (e.g. coding, translation, or information).
Ollama has a list of different embedding models and the only really good way to choose a model is to try it out with some validation datasets and see which one best results in what you (or your users) expect.
In my case, I've chosen ai-minilm as the embedding model which is also happens to be available in the Ollama library.
Example: Building LLM Applications with Ruby on Rails
I've made a simple Ruby on Rails application that implements the concepts above. You can save this file as app.rb
if you'd like to follow along.
It combines the services above: sqlite/sqlite-vec, ollama, and ai-minilm in order to do semantic search over a database that contains embeddings of a dataset (in this case, new articles).
In the spirit of first principles thinking as applied to software development, we won't be using convenience libraries such as langchain.rb and instead try to implement as much as we can on our own.
However, while this might not be a step-by-step guide to building an llm-powered ruby application, we can use it a jumping point and as inspiration for your own.
# frozen_string_literal: true
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "rails", "~> 8.0.0.rc1", require: "rails/all"
gem "sqlite3", ">= 2.1"
gem "sqlite-vec", require: "sqlite_vec"
gem "ruby-openai", require: "openai"
end
EMBEDDING_DIM = 384
EMBEDDING_MODEL = "all-minilm"
ENV["DATABASE_URL"] = "sqlite3:#{__FILE__}.sqlite3"
CLIENT = OpenAI::Client.new(uri_base: "http://100.119.14.125:11434")
HTML_TEMPLATE = <<~HTML.strip
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<title>App</title>
</head>
<body>
%{body}
</body>
</html>
HTML
module SqliteVecInitializer
def configure_connection
super
db = @raw_connection
db.enable_load_extension(true)
SqliteVec.load(db)
db.enable_load_extension(false)
end
end
require "active_record/connection_adapters/sqlite3_adapter"
ActiveRecord::ConnectionAdapters::SQLite3Adapter.prepend(SqliteVecInitializer)
ActiveRecord::Base.establish_connection
ActiveRecord::Base.logger = Logger.new($stdout)
class VecItem < ActiveRecord::Base
self.primary_key = "rowid"
def self.search(embedding, limit: 3)
where("embedding MATCH ?", embedding.to_s).order(distance: :asc).limit(
limit
)
end
end
class Item < ActiveRecord::Base
belongs_to :vec_item
def self.search(input, client:, model:, limit: 3)
query = client.embeddings(parameters: {model: model, input: input})
vec_items = VecItem.search(query.dig("data", 0, "embedding"), limit: limit)
where(vec_item: vec_items)
end
end
class App < Rails::Application
routes.draw do
get "/" => "home#search"
get "/search" => "home#search"
end
end
class ApplicationController < ActionController::Base
include Rails.application.routes.url_helpers
end
class HomeController < ApplicationController
def search
@query = params[:query]
if @query
@items = Item
.search(@query, client: CLIENT, model: EMBEDDING_MODEL, limit: 15)
end
body = <<~HTML.strip
<p><b>Query: <%= @query %></b></p>
<ul>
<% @items&.each do |item| %>
<li><pre style="white-space: pre-wrap; font-size: large">
<%= item.entry %>
</pre></li>
<% end %>
</ul>
<%= form_with url: "/search", method: :get do %>
<%= text_field_tag :query, nil, placeholder: "Search Query" %>
<%= submit_tag "Ask" %>
<% end %>
HTML
render inline: HTML_TEMPLATE % {body: body}
end
end
if $PROGRAM_NAME == __FILE__
require "rails/command"
require "rails/commands/server/server_command"
Rails.logger = Logger.new($stdout)
Rails::Server.new(app: App, Host: "0.0.0.0", Port: 3000).start
end
You can run it via
ruby app.rb
and then going to http://localhost:3000
You will be presented with a simple form with a query box where you can submit your query or prompt into the rails application, which then retrieves search results based on how "similar" these queries are with the embedded data.
If you are following along, you might want to get also the same dataset that I've been using. Here's a setup program that seeds the sqlite
database with some news data. The dataset is a scrape of a number of Onion articles with a headline and a body which is hosted at huggingface.
You can save it as setup.rb
require "./app"
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "csv"
end
def reset_db
ActiveRecord::Schema.define do
drop_table :vec_items, if_exists: true
drop_table :items, if_exists: true
create_virtual_table :vec_items,
:vec0, ["embedding float[#{EMBEDDING_DIM}] distance_metric=L2"]
create_table :items do |t|
t.integer :vec_item_id
t.string :entry
t.timestamps
end
end
end
def seed_db(data, ai_client)
data.each.with_index do |row, index|
headline, body = row.split(" #~# ")
input = "headline: #{headline}\n\nbody: #{body}"
response = ai_client.embeddings(parameters: {model: EMBEDDING_MODEL, input: input})
embedding = response.dig("data", 0, "embedding")
VecItem.create(rowid: index, embedding: embedding.to_s)
Item.find_or_create_by(entry: input) { |obj| obj.vec_item_id = index }
end
end
reset_db
train = File.read("NewsWebScrape.txt").split("\n")
seed_db(train, CLIENT)
puts VecItem.count
When you run ruby setup.rb
it will parse through the NewsWebScrape.txt
file and get the embeddings for each line via the ai-minilm
model hosted in ollama. These embeddings are then stored as a vector via the sqlite-vec extension, with the default distance metric being the L2 euclidean distance between vectors a
and b
.
Here's a screenshot of the application in action:
As you can see in the screenshot above, doing a query for the word "politics" can bring back search results related to politics but not necessarily mention the keyword "politics" itself. For example, one of the search results is as follows:
headline: Republicans Call For Privatization Of Next Election
body: WASHINGTON, DC—Citing the "extreme inefficiency" of this month's U.S. presidential election, key Republicans called for future elections to be conducted by the private sector.
There is no keyword "politics" in the search result, but "election" and "Republican" (which arguably are very much related to "politics") do show up.
Conclusion
The availability of Open Source models and APIs such as Ollama has made AI usage and development more and more accessible. We've seen how we're able to connect to an Ollama instance via an OpenAI compatible interface and generate embeddings for our documents.
We were also able to store these embeddings into a vector database with the help of sqlite and sqlite-vec. We were also able to incorporate these embeddings into a Ruby on Rails application and showed that instead of just searching for keywords, we were able to search for meaning.
We've seen that with proper foundations and technical knowledge, we do not have to be limited to using what is popular and in vogue, and are able to adapt our existing expertise into one that takes advantage of AI tools and services when building llm powered applications. We made an application using Ruby via first principles, without depending on using langchain or similar libraries that may be convenient, but hide the complexity of the fundamentals.
Coming up In Part 2 we'll take things a step further and use the documents retrieved from the vector search as authoritative knowledge for contextual use by an LLM, serving as the first component in a RAG (Retrieval Augmented Generation) pipeline. We'll also be looking at building ai agents, as well as multimodal models that incorporate vision and sound in their results.
Resources
- https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
- https://huggingface.co/datasets/Biddls/Onion_News/tree/main
- https://www.cloudflare.com/learning/ai/what-is-vector-database/
- https://www.cloudflare.com/learning/ai/what-are-embeddings/
- https://www.youtube.com/watch?v=FJtFZwbvkI4 (3Blue1Brown)
- https://youtu.be/LPZh9BOjkQs?si=R_iO5fzbKXs3LiTs (3Blue1Brown)
- https://alexgarcia.xyz/sqlite-vec/ruby.html
- https://www.cloudflare.com/en-gb/learning/ai/what-is-vector-database/
- https://medium.com/@rushing_andrei/building-llm-powered-applications-in-ruby-6e16d8a17548 (Andrei Bondarev)