Update 9th January, 2024: Changed the title to “Fun With AI Embeddings in Go”

Before the end of last year, I visited San Francisco (SF) for a few weeks. It felt great meeting some old friends and ex-colleagues face-to-face after a long hiatus. There is something incredibly refreshing about being in the same room with the folks you’ve spent chatting to so much time over the past few years on Zoom or Slack. Real-life connections remain undefeated and I hope it will stay that way.

I knew AI has taken the world by storm in the past year but I was still taken aback by the sheer energy the folks I met in SF radiate with. Crazy ideas were flying at me from literally everywhere. The nerd populace of the city felt rejuvenated in comparison to how I remember my last visit there a few years ago. The enthusiasm of the folks I met was very infectious! Not to mention the impromptu Open AI mini-conference followed by a wild rave DJ-ed by Grimes fell somehow randomly on my second day in the city!

Similarly, entirely coincidentally, without my realizing it GitHub Universe was also happening at the same time in SF which made things even more fun for me because I got to catch up with some of my GitHub friends, too. Unsurprisingly the main theme of the Universe was Generative AI. In GtHub’s case it was mildly disappointing because there is still so much work GH should do to make the existing product so much better than it is now before throwing most of its weight behind GenAI. I mean, have you ever reviewed a large PR that turned into a mass conversation of 5 people? But I digress…

The motivation

At one of the dinners I had with my friends we got to talk about non-tech things like TV shows, movies, and music. And naturally, at some point, the conversation turned to Taylor Swift. I know what you’re thinking, but no this isn’t one of those posts by a middle-aged man that piles on Taylor.

Anyway, I’m not a Swifty and have nothing against the church of Taylor Swift, but I’ve always wondered how she managed to become such a conglomerate that people literally go to cinemas to watch [the replays of] her live shows, beating the box office number of some of the Hollywood movies. I’ve listened to a few songs by her in the past but none of them resonated with me, so…I remain curious.

The friends I was sharing a delicious dinner with on that night on the other hand could probably be qualified as Swifties. At least to some extent, anyway. I mean how do you react to someone summarising her lyrics as: “Taylor speaks the truth”. Sarcasm or not, this sentence stuck with me for whatever reason, though I didn’t expect it’d lead to some fun hacks in the weeks that followed…

Over the past year or so I’ve mostly listened to white noise on Spotify - no, really - though every once in a while I’d put on some of my favourite metal bands. Metal is not the only genre of music I like but I grew up listening to it and old habits and tastes die hard.

On the flight back from the west of the USA I somewhat randomly started thinking about that dinner conversation with my friends. I became curious about how the lyrics in Taylor Swift songs compare to the lyrics of one of my favourite heavy metal bands. Yes, I occasionally listen to heavy metal and yes I could spend my flight reading through every song I’d find on the internet and come up with some sort of comparison in my head that’d justify my saying: “My band is better than yours”. But that’s not how I roll which is where the title of this blog post starts becoming a bit clearer, hopefully.

The task

Though one might argue that art is subjective and therefore hard to quantify I figured I’d still try and compare the two artists. Given my contracting the AI virus the best way I could think of doing the comparison was by leveraging embeddings. The LLMs got pretty good at generating text recently which makes the text embeddings a pretty good tool for the task of comparing the lyrics. Maybe. Probably. But you know…

Mathematics is the language of the universe

Though Python is [still] arguably the uncontested language of AI/ML – and I can’t see that changing anytime soon – there is a sizeable Go community out there which I figured could be interested in how to to do some of this analysis in Go. Plus I enjoy working on hard problems and doing some of this stuff in Go pretty much checked all the boxes for me.

NOTE: It’s much “easier” to accomplish what I’ll describe in this post in Python. Pick your battles, folks!

The embeddings

I rolled up my sleeves and charged forward. First I had to scrape the lyrics. I used the gocolly framework for this because I was already familiar with it. There is also geziyor which would’ve probably been a good tool for the job, too, and probably many others I’m not even aware of. Pick whatever tool gets the job done and move on – especially when it comes to tedious tasks like scraping the web!

With all the data scraped down the next task was to generate the lyrics embeddings. I wanted to make my life as easy as possible so I opted to use the OpenAI Ada model. The most popular OpenAI Go module kinda melted my face a bit with how much redundant code it contained (it doesn’t use generics so it contains a boatload of similar code) – I didn’t want to pull all that stuff into my hack. Besides, the modules provide quite comprehensive OpenAI API coverage and all I wanted was the embeddings.

So the idiot I am I hacked up a small embeddings Go module. Part of it was that I wanted something slim but I also wanted to learn about these Gen AI APIs. The module currently supports the OpenAI embeddings API but also Cohere AI and the Google Vertex AI embeddings APIs.

Getting the embeddings for any text content with the new Go module is quite simple:

	c := openai.NewClient()

	embReq := &openai.EmbeddingRequest{
		Input:          input,
		Model:          openai.TextAdaV2,
		EncodingFormat: openai.EncodingFloat,
	}

	embs, err := c.Embed(context.Background(), embReq)
	if err != nil {
		log.Fatal(err)
	}

Now, I should clarify the methodology here a bit. I wanted to zoom in on both the individual song lyrics but also on the albums as a whole. So I generated embeddings for each individual song lyrics of each artist, first: the input variable in the code above code would contain the lyrics of the whole song which would be turned into a single embedding vector. When it comes to comparing the albums as wholes, I’ve simply calculated average embeddings across all songs in each individual album. This is just to save some time as this was mostly a fun hack rather than a rigorous study.

The alternative way could be chunking the lyrics in both the songs and albums and do some clever things with the chunks before using the final embeddings. If you are into that then you can have a look at the text package in the above mentioned Go module that implements simple document splitters heavily inspired by the popular Langchain framework. The splitters are basically a Go rewrite of the character and recursive character langchain splitters. Using the text package you could split the songs (or concatenated songs of each album) and calculate averages across the generated embeddings and use those instead of the full song string as I did.

The projections

Now, we could come up with some wild frameworks for “comparing” the two artists like whose songs are sadder: in this case, I’d just grab a bunch of sad sentences from the interwebzzz, generate the embeddings for them and then calculate the average distance from each song to these sentences. The smaller the average distance would be the sadder the artist’s songs.

But I wanted something quick and dirty, something I wouldn’t need to break my head over too much. Ideally, it’d be something visual that’d make things more obvious from the get-go. I remember a while ago I played around with the awesome go-echarts Go module which is able to generate different types of charts using the Go programming language. It’s a port of the fantastic Apache echarts library which actually generates an HTML file that bundles the JS library into it which you can then open in your browser. Exactly what I needed!

But before I could do that I had to massage the embeddings a bit. See, the embeddings are wild beasts i.e. large vectors whose size often exceeds 1k elements. How does one display a 1K dimensional vector in a chart? How does one even imagine one for that sake!

The solution here is projecting the large dimensional data into lower dimension(s). This leads to losing some context but generally does a pretty good job of visualizing high-dimensional data to wrap your head around.

There are probably more options out there but I opted to generate 2D and 3D dimensional projections using the Principal component analysis (PCA) and t-SNE. I actually wrote a two-part blog post series many moons ago about the beauty behind PCA. If you are into practical applications you can go to the second part and check out how the PCA can be applied to a very rudimentary digital image compression if you scroll at the bottom of the second post.

In order to project high dimensional data to lower dimension in Go I had to revisit some previously acquired knowledge of the wonderful gonum framework. It’s got all the pieces we need to get the PCA vectors so we can use them for generating 2D/3D projections. Specifically, we need the mat package for matrix operations and the stat package for the PCA algorithm implementation. As for the t-SNE, there is actually a dedicated Go module which is referenced by the t-SNE project as an official Go implementation.

Now I had to piece these things together. Here’s a simple Go snippet that demonstrates how to generate the PCA projections in Go:

// Vector stores embeddings for each song
type Vector struct {
    // Name is the name of the song.
	Name   string    `json:"name"`
    // Values are embeddings elements.
	Values []float64 `json:"vector"`
}

// Data is used to store artist albums.
type Data struct {
    // Name of the album.
	Name    string   `json:"name"`
    // Vectors stores embeddings for each song.
	Vectors []Vector `json:"embeddings"`
}

func getPCA(embs []Data, dim int) ([]Data, error) {
	pcas := make([]Data, 0, len(embs))

	for _, e := range embs {
		items := make([]Vector, 0, len(e.Vectors))
		// embMx: each row is a song whose dimension is the length of its embedding
		embMx := mat.NewDense(len(e.Vectors), len(e.Vectors[0].Values), nil)
		for i, t := range e.Vectors {
			embMx.SetRow(i, t.Values)
		}
		r, _ := embMx.Dims()
		if r == 1 {
			log.Printf("skipping data %s due to low number of items: %d", e.Name, len(e.Vectors))
			continue
		}
		var pc stat.PC
		ok := pc.PrincipalComponents(embMx, nil)
		if !ok {
			log.Printf("failed pca for %s", e.Name)
			continue
		}
		var proj mat.Dense
		var vec mat.Dense
		pc.VectorsTo(&vec)
		proj.Mul(embMx, vec.Slice(0, len(e.Vectors[0].Values), 0, dim))

		for i := range e.Vectors {
			items = append(items, Vector{
				Name:   e.Vectors[i].Name,
				Values: proj.RawRowView(i),
			})
		}
		pcas = append(pcas, Data{
			Name:    e.Name,
			Vectors: items,
		})
	}

	return pcas, nil
}

We use this func to generate embedding projections for both 2D and 3D charts. Note that calculating PCA for a single vector does not make any sense so we skip those cases.

The t-SNE code looks quite similar:

func getTSNE(embs []Data, dim int) ([]Data, error) {
	tsnes := make([]Data, 0, len(embs))

	perplexity, learningRate := float64(30), float64(200)
	if dim == 3 {
		perplexity, learningRate = float64(30), float64(200)
	}

	for _, e := range embs {
		items := make([]Vector, 0, len(e.Vectors))
		// embMx: each row is a song whose dimension is the length of its embedding
		embMx := mat.NewDense(len(e.Vectors), len(e.Vectors[0].Values), nil)
		for i, t := range e.Vectors {
			embMx.SetRow(i, t.Values)
		}

		t := tsne.NewTSNE(dim, perplexity, learningRate, 3000, true)
		resMat := t.EmbedData(embMx, nil)
		d := mat.DenseCopyOf(resMat)

		for i := range e.Vectors {
			items = append(items, Vector{
				Name:   e.Vectors[i].Name,
				Values: d.RawRowView(i),
			})
		}
		tsnes = append(tsnes, Data{
			Name:    e.Name,
			Vectors: items,
		})
	}

	return tsnes, nil
}

I should point out the t-SNE has two hyperparameters: perplexity and learing rate, which means if you are going to use it you need to tune them a bit for your use case. You can learn more about them in this wonderful post.

Now that we have the projections calculated for all of our data it’s time to generate some charts.

The charts

As I mentioned above, I opted to use the go-echarts Go module, though another option would be to use one of the plot package from the Gonum framework. It doesnt generate the HTML files though and the charts generated by the go-echarts module are just too beautiful to pass on, so I went with that.

There are truckload of examples in the examples repository that I checked out beforehand. Specifically, I was after the 2D and 3D charts; the scatter and scatter3D charts checked all the boxes for my requirements. They provide options for displaying labels and tooltips which was a very nice cherry on the cake.

Here’s a sample code for the global chart config

	chartOptions := []charts.GlobalOpts{
		charts.WithTitleOpts(opts.Title{
			Title:    title,
			Subtitle: "Lyrics Embeddings",
		}),
		charts.WithTooltipOpts(opts.Tooltip{
			Show:      true,
			Formatter: "{a}: {b}",
		}),
		charts.WithToolboxOpts(opts.Toolbox{
			Show:   true,
			Orient: "horizontal",
			Left:   "right",
			Feature: &opts.ToolBoxFeature{
				SaveAsImage: &opts.ToolBoxFeatureSaveAsImage{
					Show: true, Title: "Save as image"},
				Restore: &opts.ToolBoxFeatureRestore{
					Show: true, Title: "Reset"},
			}}),
	}

You then need to generate time series which are then rendered into a chart. Here’s one way to do that:

    func add2DSeries(artist string, data []Data, chart *charts.Scatter) error {
    	var chartData []opts.ScatterData
    	for _, d := range data {
    		for _, p := range d.Vectors {
    			vals := make([]interface{}, len(p.Values))
    			for i := range p.Values {
    				vals[i] = p.Values[i]
    			}
    			chartData = append(chartData, opts.ScatterData{
    				Name:   fmt.Sprintf("%s (%s)", p.Name, d.Name),
    				Value:  vals,
    				Symbol: "roundRect",
    			},
    			)
    		}
    	}
    	chart.AddSeries(artist, chartData)
    	return nil
    }

    func add3DSeries(artist string, data []Data, chart *charts.Scatter3D, grad bool) error {
    	var chartData []opts.Chart3DData
    	for i, d := range data {
    		for _, p := range d.Vectors {
    			vals := make([]interface{}, len(p.Values))
    			for i := range p.Values {
    				vals[i] = p.Values[i]
    			}
    			chartData = append(chartData, opts.Chart3DData{
    				Name:      fmt.Sprintf("%s (%s)", p.Name, d.Name),
    				Value:     vals,
    				ItemStyle: &opts.ItemStyle{Color: colors[i]},
    			},
    			)
    		}
    	}
    	chart.AddSeries(artist, chartData)
    	return err
    }

    r := charts.NewScatter() // or charts.NewScatter3D()

	if err := add2DSeries(swift, swiftPcas, scatter); err != nil {
			log.Fatal(err)
	}

	f, err := os.Create("chart.html")
	if err != nil {
		panic(err)
	}
	if err := r.Render(io.MultiWriter(f)); err != nil {
		log.Fatal(err)
	}

As I said earlier this will spit out an HTML file you can then open in the browser. And with this, we’re pretty much done. We finally have something we can look over and see what we can make of it.

The Results

Let’s have a look at the 2D chart of the song lyrics projections to see if we can spot any discernible difference between the two artists.

NOTE: all the charts discussed in this post are ineractive

From the first look at the PCA projections, it would appear that they’re pretty equally…..eclectic? Perfect. Or not. It depends. PCA kinda takes my side which is: whatever, both artists sing about more or less the same kinda things and the contents of their lyrics are pretty diverse.

If we look at the t-SNE projections things are a bit different. It’s worth mentioning that t-SNE is [generally] better at generating lower dimensional projections (at least the 3D ones, anyway) because PCA assumes the data you are projecting are linear; which very roughly means that you can generate any vector (song lyrics) in the data by a linear combination of the vector basis (nerd alert!). Or put it in layman’s terms: any song lyrics is a rehash of other songs in the same data set – or something like that. Now, this is a very strong assumption as you can imagine – at least for a lot of artists this usually isn’t the case (though I imagine there are some that wouldn’t surprise me if this held true).

t-SNE does not make this assumption. It uses a clever way to calculate the probabilities the two pieces of data are [somehow] related – this is quite different from PCA. I’d encourage you to read about it on its wiki page and accompanying references as the algorithm itself are pretty fascinating and has become pretty much the standard way to visualise high-dimensional data.

Because of these useful properties, from now on we’ll be using t-SNE when doing our analysis.

t-SNE has a couple of hyperparameters so you need to play around with them to get somewhat reasonable results. What that means is a bit hazy to guess but I guess in our case we would be looking for some patterns like, say, clustering of songs with similar content (context) or some such.

Now, what I find interesting in the t-SNE projection is how eclectic my fav band lyrics are: you can see how more spread around the songs are in comparison to Taylor Swift songs. I have a theory about this but I dont dare to anger Swifties so I shall keep it to myself :-)

I think it’s fairly safe to assume from the 2D projections the songs don’t tell us much about how different the lyrics are: maybe Masterplan “speak the truth” as well. Probably. Most definitely.

Let’s have a look at the albums though. Let’s recap what I did to generate the embeddings for each album: take the embeddings of all individual songs in each album and calculate the average vector from them; use that vector as an embedding that represents the whole album – we are making an assumption that each album attempts to convey somewhat consistent message throughout all the songs it contains. This is a reasonable assumption that gives us a rough approximation of the content of each album.

To make things more interesting I’ve also decided to pull in the embeddings of basic human emotions: happy, sad, surprised, angry, disgusted, and fear. The way I went about it was somewhat similar to what I did with the album embeddings: I grabbed some (10ish) sentences for each emotion from the internet, got embeddings for each of them, and once again averaged them into a single vector that “represents” the specific emotion.

Then I calculated projections for each emotion vector using PCA and t-SNE. I figured this could give me a very rough idea about the general sentiment of each album. Again, I shall stress we are comparing averages with averages of lyrics of various sizes so take this with a major pinch of salt! This will do just fine for this hack.

The t-SNE 2D projection suggests that Taylor Swift seems to write happier songs than the Masterplan. Go figure! Also, it would appear nobody seems to be singing about disgust – not even the metal band! How peculiar. Still, this does not seem quite what I’d expect and there is not much I could assume from this. It also goes to show how easy it is to fool oneself by very questionable analysis applied to wild approximations of raw data.

Now, if we zoom in on the 3D projection things are a bit more interesting. Whilst the 2D projections made us believe that some of the lyrics of both artists are quite similar if we play around with the 3D chart we get to see that that’s not quite the case for quite a few songs which is totally not obvious from the 2D chart. This again goes to show how easy it is to get played by lower dimensional data. There is a lesson here hiding along the lines of “look for models that are as simple as possible but not simpler” – 2D is simpler than 3D but it can fool you if you are not careful enough.

It goes without saying that the higher dimensional data carry more context (information) and therefore should lead to better analysis and less flawed results. Another lesson here is, I suppose, always try to project the data into 3 dimensions before dropping one level below. The latter is not entirely useless though, if you are aware of what’s happening. Also, sometimes you might not be able to display the 3D charts so it’s nice to have some option that lets us view the data at least in 2D.

The embeddings again

Embeddings are one of the most fascinating artifacts of AI/ML. They allow us to represent any piece of data as numerical vectors that capture the context of the data within the domain the embeddings were trained on. This is why fine-tuning your embedding models should theoretically yield better results for the given domain than simply reusing the “generic” embedding vectors. I’d wager if we fine-tuned the Open AI model on all of the lyrics and emotion-capturing data we would get a way better idea about what’s going on in the lyrics of both artists. As an aside, one of my good friends just started a new company that lets you fine-tune models and some! Go check it out here.

Embeddings open an awful lot of opportunities that haven’t been explored entirely, yet. From what I’ve seen the predominant use case at the moment is using embeddings in unison with some context addressable stores a.k.a. vector databases where one stores knowledge bases and does quick look-up in them over some content and uses the returned results as an additional context when talking to LLMs. This increases the accuracy of RAG results and helps with preventing LLM hallucinations.

This is cute, but it’s the most obvious use case which barely scratches the surface of what embeddings could be capable of! The mere fact that an embedding provides a numerical representation of a thing in some domain is a very powerful notion. You can assign numbers to any (not necessarily) textual content!

Why is that powerful? Anything that can be represented as a numerical vector can have a whole class of mathematical opertions applied on them; some of which may feel like nonsense – what does adding “car” vector with “evil clown” result in? – others less so. Imagine recommendation engines that leverage the context of a piece of data within a domain (yes I am familiar with collaborative filtering), classifying similar data through clustering, etc.

EMBEDDINGS GIVE YOU SUPERPOWERS!

And you can take it much further e.g. if you think of each vector as a node in a network you can analyse contextual relationships between the nodes (say, in our case, the lyrics) – “single dimensionality” of networks has always been a huge problem in network analysis and it actually recently lead to the creation of theory og hypergraphs which once again, opens a whole another avenue towards an understanding of the world and developing better products.

It gets better still. The Open AI CLIP model lets you generate multimodal embeddings, in fact so does the Vertex AI (FYI: support for these is built into the go-embeddings Go module used in this hack). This creates a link between the images and text and lets you do all kindsa crazy things with image search or whatnot. I’ve seen someone blogging about splitting an image into patches of identified objects in the image, then generating embeddings for each of them and storing them in a vector store for querying – that lets you search for content inside images not just for the “average” image content per se!

One of the more interesting use cases I learnt about recently from one of my best friends, Dan, was about using embeddings in reverse engineering. Dan is the CTO of Teller, the best API provider for connecting your app to banking institutions. Teller do a lot of reverse-engineering, which is some of the most fascinating things but sometimes it can become a rather tedious activity. Now, imagine you could build a knowledge base of some frequently appearing pseudo code snippets you’ve seen before whilst reversing some other apps – this is not an unreasonable assumption given a lot of companies reuse lots of libraries in their code – and then cross-reference/match them to a new freshly decompiled program you get access to. This can vastly simplify and speed up your work. You could even take it further, by building full ASTs using something like a graph DB, etc.

Conclusion

Whilst Python is and will remain the dominant language in the AI/ML space, there are plenty of Go modules that let you do some interesting things with data and vector embeddings.

Happily – or sadly, depending on whose side you are on – my quick Go hacks havent uncovered much difference between Taylor Swift and Masterplan lyrics. I am going to assume if Taylor speaks the truth then so does Masterplan and will continue to listen to whatever comes my way.

In the future, I’d like to take this a bit further, still. The recent announcement by Google brain team introduced the Gemini project which promises multimodal embeddings that include audio data. It’d be great to sample the songs and inspect their projections and maybe see how much our taste in the actual music differs, instead of focusing just on a single dimension: lyrics.

If you have any fun ideas about how embeddings can be used or simply want to share how you leveraged them i whatever use-case you had, please let me know in the comments. See you in another post!

You can find [most of] the code used in this hack on GitHub.