Using Cuelang With Go for LLM Data Extraction

I have been aware of Cuelang (CUE) pretty much since the early stages of its development. It always seemed to me the language had the potential to solve a lot of problems in the ocean of YAML which we found ourselves drowning in the Cloud Native ecosystem.

CUE excels in validating data against strictly defined schemas and is equally capable of generating code for data models from them. These are wonderful features, though I hadn’t found the perfect application for them in any of the projects I had been working on. That changed recently with my increased involvement in projects utilizing Large Language Models (LLM)s.

LLMs, whilst incredibly powerful occasionally make up things that are just completely bogus. You always have to verify and validate the data you get back from them. I felt like this was a fun opportunity to take CUE for a spin by using it to validate LLM outputs.

Go is still my preferred programming language so I naturally gravitated towards finding a way how to use CUE with Go for this task. Besides, CUE itself is written in Go, so it provides some useful Go packages to work with. This post describes how I used CUE and Go for extracting and validating structured data from unstructured text using LLMs.

If you are interested in the final result(s) rather than learning abuot CUE and Go just scroll to the bottom section which displays a couple of Go programs you can use as an inspiration in your LLM projects!

CUE and Go

I’m not going to provide any introduction to CUE here. The language tour does a great job explaining its features on concrete examples. And it also provides a playground so you get to play with the language in the browser without installing any tools or programs on your computer. Go check it out!

The official site provides a few short guides about how you can use CUE with Go. One of the canonical use cases for CUE is data validation, so the guides naturally focus on validating Go structs against the schemas written in CUE.

One of the examples defines the following Person struct in CUE:

package example

#Person: {
	name?: string
	age?:  int & <=150
}

The Person struct has two fields: name and age. Any instance of the Person might have a name and age values assigned to them. I put emphasis on might because both fields have the ? token which means they’re both optional: you can omit either of them and the instance would still pass validation as long as the types of the values you specify complies with the schema. Well, almost! The age value has an extra constraint defined on it besides the type: if age is provided its value must be smaller than or equal to 150. Anything else fails the validation.

Here is a Go struct whose instances we can validate using the above CUE schema:

type Person struct {
	Name string `json:"name"`
	Age  int    `json:"age"`
}

If you go ahead and try validating the following Person instance you will see that the validation succeeds. Both the name and age have been provided with correct types checked by the Go compiler and they both satisfy the constraints in the CUE schema: 99 ≤ 150

	person := Person{
		Name: "Charlie Cartwright",
		Age:  99,
	}

Given the name and age are defined in the schema as optional you might think that the “empty” person value i.e. the person instance whose struct fields are initialized to the default values of their types should validate successfully:

// name is "", age is 0
p := Person{}

Alas, no. If you run the program you’ll notice that the validation fails. How could that be?! Go unfortunately does not have optionals, but CUE gets around it in a rather clever way — I mean, depending on how you look at it.

Encode traverses the value v recursively. If an encountered value implements the json.Marshaler interface and is not a nil pointer, Encode calls its MarshalJSON method to produce JSON and convert that to CUE instead

The encoding of each struct field can be customized by the format string stored under the “json” key in the struct field’s tag

The “omitempty” option specifies that the field should be omitted from the encoding if the field has an empty value, defined as false, 0, a nil pointer, a nil interface value, and any empty array, slice, map, or string.

So they’re basically tapping into a lot of functionality provided by encoding/json package in the standard library. If a field has the omitempty option specified the value is omitted and thus as a result can pass the optional field CUE validation! We can take advantage of that in our code.

In order to get the empty person to pass validation we simply need to make sure the fields are omitted in JSON by using the omitempty option on the json tag; the following will pass the validation:


	// NOTE: we explicitly mark both fields as optional by using `omitempty`
	type Person struct {
		Name string `json:"name,omitempty"`
		Age  int    `json:"age,omitempty"`
	}

...
...

	person := Person{}

	// convert person into CUE value
	personAsCUE := ctx.Encode(person)

	unified := schema.Unify(personAsCUE)
	if err := unified.Validate(); err != nil {
		fmt.Println("❌ Person: NOT ok")
		log.Fatal(err)
	}
	fmt.Println("✅ Person: ok")

One other handy feature you get from CUE’s Go packages is OpenAPI schema generation from your models. Let’s have a look at how. We’ll first update the Person schema file with description comments. These are used in the OpenAPI object property schema descriptions when we generate them:

package example

// A Person
#Person: {
	// A person's name
	name?: string
	// A person's age
	age?: int & <=150
}

Here’s a sample code you can use to generate the OpenAPI schema from the above CUE schema:

package main

import (
	"bytes"
	_ "embed"
	"io"
	"log"
	"os"

	"cuelang.org/go/cue/cuecontext"
	"cuelang.org/go/encoding/openapi"
)

type Person struct {
	Name string `json:"name"`
	Age  int    `json:"age"`
}

//go:embed schema.cue
var schemaFile string

func main() {
	ctx := cuecontext.New()
	schema := ctx.CompileString(schemaFile)
	info := struct {
		Title   string `json:"title"`
		Version string `json:"version"`
	}{"Person API", "v1"}
	resolveRefs := &openapi.Config{
		Info:             info,
		ExpandReferences: true,
	}

	b, err := openapi.Gen(schema, resolveRefs)
	if err != nil {
		log.Fatal(err)
	}

	io.Copy(os.Stdout, bytes.NewBuffer(b))
}

Notice that we are embedding the schema file into the program. We could just as easily have hardcoded it into the code as a string literal and used that as an argument to CompileString function.

If you run the program above it will output the following JSON:

{
  "openapi": "3.0.0",
  "info": {
    "title": "Person API",
    "version": "v1"
  },
  "paths": {},
  "components": {
    "schemas": {
      "Person": {
        "description": "A Person",
        "type": "object",
        "properties": {
          "name": {
            "description": "A person's name",
            "type": "string"
          },
          "age": {
            "description": "A person's age",
            "type": "integer",
            "maximum": 150
          }
        }
      }
    }
  }
}

The description properties of each object field are read from the schema comments as I mentioned earlier. Notice how the generator nicely parsed the age constraint into the maximum property! This is very useful context that gets passed to an LLM which can help us point it in the right direction.

This is all nice and handy, but it requires either maintaining files that define validation constraints or hardcoding them into the source code as string literals like so:

const cuePerson = `
#Person: {
	name?: string
	age?: int & <=150
}
`

	ctx := cuecontext.New()
	schema := ctx.CompileString(cueSource).LookupPath(cue.ParsePath("#Person"))

I think having the schema definition in a dedicated file provides certain advantages but wouldn’t it be nice if we could leverage the Go struct tags for defining and validating field constraints? It turns out we can! There is a rather “obscure” Go package in the CUE codebase that lets us tap into exactly that. It’s called cuego. The package is rather small and has great docs which explain how to use it pretty well.

Here’s an example straight from the package docs:

package main

import (
	"fmt"
	"strings"

	"cuelang.org/go/cuego"
)

type Sum struct {
    A int `cue:"C-B" json:",omitempty"`
    B int `cue:"C-A" json:",omitempty"`
    C int `cue:"A+B" json:",omitempty"`
}

func main() {
    fmt.Println(cuego.Validate(&Sum{A: 1, B: 5, C: 6}))
}

Each field in the Sum struct has a constraint defined on it which references other struct fields:

  • A must be the same as the result of Sum.C - Sum.B (C-B); in our case, that’s 6 - 5; the value we set for A when we create an instance of Sum is 1 which satisfies the constraint
  • B must be the same as the result of Sum.C - Sum.A (C-A); in our case, that’s 6 - 1; the value we set for B we create an instance of Sum is 5 which satisfies the constraint
  • C must be the sum of Sum.A and Sum.B (A+B); in our case, that’s 1+5; the value we set for C we create an instance of Sum is 6 which satisfies the constraint

There is one other feature provided by the cuego package which is worth mentioning and that’s CUE completions.

Complete sets previously undefined values in x that can be uniquely determined form the constraints defined on the type of x such that validation passes, or returns an error, without modifying anything, if this is not possible.

What this means is CUE can fill in the missing values to satisfy the constraints automagically.

Once, again the example provided in the Go docs shows how this works:

package main

import (
	"fmt"
	"strings"

	"cuelang.org/go/cue/errors"
	"cuelang.org/go/cuego"
)

type Sum struct {
	A int `cue:"C-B" json:",omitempty"`
	B int `cue:"C-A" json:",omitempty"`
	C int `cue:"A+B" json:",omitempty"`
}

func main() {
	a := Sum{A: 1, B: 5}
	err := cuego.Complete(&a)
	fmt.Printf("completed: %#v (err: %v)\n", a, err)

	a = Sum{A: 2, C: 8}
	err = cuego.Complete(&a)
	fmt.Printf("completed: %#v (err: %v)\n", a, err)

	a = Sum{A: 2, B: 3, C: 8}
	err = cuego.Complete(&a)
	fmt.Println(errMsg(err))

}

// nicer error formatting
func errMsg(err error) string {
	a := []string{}
	for _, err := range errors.Errors(err) {
		a = append(a, err.Error())
	}
	s := strings.Join(a, "\n")
	if s == "" {
		return "nil"
	}
	return s
}

If you run the program you’ll get the following output:

completed: main.Sum{A:1, B:5, C:6} (err: <nil>)
completed: main.Sum{A:2, B:6, C:8} (err: <nil>)
2 errors in empty disjunction:
conflicting values null and {A:2,B:3,C:8} (mismatched types null and struct)
A: conflicting values 5 and 2

Noticee how CUE was able to fill in the value of C in the first case so it satisfies the CUE constraints set on the Sum struct. The same goes for field B in the second case; CUE set it correctly to 6 to satisfy the constraints defined using cue struct tags. Finally, the completion of the last case Sum{A: 2, B: 3, C: 8} fails because the values fail to satisfy the constraint set on the C which is set to 8 which is obviously different from what the constraint requires: 2+3 = 5.

Once again, the omitempty option specified for the json tag is important. Say you omit the json:",omitempty" option in field C; the first completion will now fail because C is no longer considered to be optional:

package main

import (
	"fmt"
	"strings"

	"cuelang.org/go/cue/errors"
	"cuelang.org/go/cuego"
)

type Sum struct {
	A int `cue:"C-B" json:",omitempty"`
	B int `cue:"C-A" json:",omitempty"`
	C int `cue:"A+B"`
}

func main() {
	// NOTE: this will FAIL because C can no longer be omitted
	a := Sum{A: 1, B: 5}
	err := cuego.Complete(&a)
	fmt.Printf("completed: %#v (err: %v)\n", a, err)

	a = Sum{A: 2, C: 8}
	err = cuego.Complete(&a)
	fmt.Printf("completed: %#v (err: %v)\n", a, err)

	a = Sum{A: 2, B: 3, C: 8}
	err = cuego.Complete(&a)
	fmt.Println(errMsg(err))

}
// errMsg func is omitted for brevity

So, we have a way to fill in the missing data via Completions as well as validating the struct fields via the cue struct tags. The last thing we need is to generate a JSON schema from Go structs, rather than from CUE schema files. This is where things get complicated. BIGLY!

I’ve tried figuring out how to generate the OpenAPI schema from Go structs annotated with cue struct tags, but I have failed miserably. Here is one approach I tried:

package main

import (
	"fmt"
	"log"
	"os"

	"cuelang.org/go/cue/cuecontext"
	"cuelang.org/go/encoding/gocode/gocodec"
	"cuelang.org/go/encoding/openapi"
)

type Sum struct {
	A int `cue:"A,C-B" json:",omitempty"`
	B int `cue:"B,C-A" json:",omitempty"`
	C int `cue:"C,A+B" json:",omitempty"`
}

func main() {
	ctx := cuecontext.New()
	codec := gocodec.New(ctx, &gocodec.Config{})

	// Extract the Go struct into CUE
	schema, err := codec.ExtractType(&Sum{A: 2, B: 3, C: 8})
	if err != nil {
		log.Fatalf("error extracting CUE schema: %v", err)
	}

	// Print the CUE instance to check its content
	fmt.Printf("CUE Val:\n%v\n", schema)

	// Prepare OpenAPI configuration
	info := struct {
		Title   string `json:"title"`
		Version string `json:"version"`
	}{"Sum API", "v1"}

	c := &openapi.Config{
		Info:             info,
		ExpandReferences: true,
	}

	// Generate OpenAPI specification
	b, err := openapi.Gen(schema, c)
	if err != nil {
		log.Fatalf("error generating OpenAPI spec: %v", err)
	}

	_, _ = os.Stdout.Write(b)
}

This unfortunately produces an empty OpenAPI JSON, but what’s more interesting is that the CUE constraints specified via the cue tags are completely ignored when extracting the type. Here’s what the output produced by the program above looks like:

CUE Val:
*null | {
	A: int64
	B: int64
	C: int64
}
{"openapi":"3.0.0","info":{"title":"Sum API","version":"v1"},"paths":{},"components":{"schemas":{}}}%

Also if I omit the initial cue tag option I get null CUE value i.e. if we simply use this struct

type Sum struct {
	A int `cue:"C-B" json:",omitempty"`
	B int `cue:"C-A" json:",omitempty"`
	C int `cue:"A+B" json:",omitempty"`
}

we will get the following output

CUE Val:
null
{"openapi":"3.0.0","info":{"title":"Sum API","version":"v1"},"paths":{},"components":{"schemas":{}}}%

What is interesting is, that if you try using the cue cli to generate the schema from the Sum struct you will get something like this:

$ cue get go ./...
$ cat sum_go_gen.cue
// Code generated by cue get go. DO NOT EDIT.

//cue:generate cue get go cuelgen/sum

package sum

#Sum: {
	"A"?: int & C-B
	"B"?: int & C-A
	"C"?: int & A+B
}

This seems much better! We have three optional fields with seemingly correct constraints defined for each of them. Alas, this schema is actually broken and fails validation by the very same cue cli tool

$ cue eval sum_go_gen.cue
#Sum.A: reference "C" not found:
    ./cue.mod/gen/cuelgen/sum/sum_go_gen.cue:8:14
#Sum.A: reference "B" not found:
    ./cue.mod/gen/cuelgen/sum/sum_go_gen.cue:8:16
#Sum.B: reference "C" not found:
    ./cue.mod/gen/cuelgen/sum/sum_go_gen.cue:9:14
#Sum.B: reference "A" not found:
    ./cue.mod/gen/cuelgen/sum/sum_go_gen.cue:9:16
#Sum.C: reference "A" not found:
    ./cue.mod/gen/cuelgen/sum/sum_go_gen.cue:10:14
#Sum.C: reference "B" not found:
    ./cue.mod/gen/cuelgen/sum/sum_go_gen.cue:10:16

I’ve asked on the GH discussions how to go about this, alas to no avail, yet, so for now using the cue struct tags is more or less a no-go (pun intended), because we can’t generate the JSON schema from it to use it in our LLM prompts when attempting to extract data.

But all is not lost, there are ways around this, though they require extra dependencies as you’ll see later on in this post. For now, we’ll stick with using the CUE schema files or hardcoded string literals. I’ll describe an alternative solution later in the post.

Putting it all together

In order to demonstrate how to extract structured data from unstructured text using LLMs and CUE we’ll write a simple program in Go. We will first grab a couple of lines of some text we want to extract the data. I’ll use a small text snippet from a random football news site. We will pass this text to an LLM requesting it to return a JSON containing specific information that should be present in the given text. We will then validate the LLM output and optionally take some action if the validation fails: either prompt the model again with a follow-up prompt or fetch the missing or invalid data from somewhere (internet, Db, etc.) using a function call.

Here’s the sample text. We will pass it to our program via standard input:

Palmer opened the scoring in west London, thus becoming only the third player in Premier League
history to reach 30-plus goal involvements in a single season. He is only 22 years old.

We will use ollama for running the llama3 model locally. You could just as well use any other publicly available model that’s reasonably accurate. Obviously, OpenAI or Anthropic models would do pretty well here.

I’m interested in extracting the name of the footballer from the text as well as his age. I will use the same CUE schema we had discussed earlier:

package llm

// A Player
#Player: {
	// The player's name
	name?: string
	// The player's age
	age?: int & <=100
}

We will embed this schema file into the Go program, but we could just as well copy paste it into a string literal. It’s up to you what you prefer! We will then generate an OpenAPI schema from it and pass it to the LLM along with the following prompt:

system: You are an expert in extracting structured data from unstructured text. You must only respond in JSON format that MUST adhere to the following JSON schema: <SCHEMA>.
Do NOT invent the data that's missing, simply omit the missing data. Do NOT addy any additional JSON fields that are not present in the given schema.
Make sure you return a valid instance of the JSON object, NOT the schema itself or any part of it!
user: Here is the text <TEXT>

Here’s the full Go program:

package main

import (
	"bufio"
	"context"
	_ "embed"
	"encoding/json"
	"fmt"
	"log"
	"os"

	"cuelang.org/go/cue"
	"cuelang.org/go/cue/cuecontext"
	"cuelang.org/go/encoding/openapi"
	"github.com/tmc/langchaingo/llms"
	"github.com/tmc/langchaingo/llms/ollama"
)

type Player struct {
	Name string `json:"name"`
	Age  int    `json:"age"`
}

//go:embed schema.cue
var schemaFile string

func main() {
	llm, err := ollama.New(ollama.WithModel("llama3:latest"))
	if err != nil {
		log.Fatal(err)
	}

	cctx := cuecontext.New()

	schema := cctx.CompileString(schemaFile)
	info := struct {
		Title   string `json:"title"`
		Version string `json:"version"`
	}{"Football Players", "v1"}
	resolveRefs := &openapi.Config{
		Info:             info,
		ExpandReferences: true,
	}

	b, err := openapi.Gen(schema, resolveRefs)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("JSON SCHEMA:\n%s\n\n", b)

	scanner := bufio.NewScanner(os.Stdin)
	fmt.Println("Enter a line of text:")
	scanner.Scan()
	input := scanner.Text()
	fmt.Println()

	content := []llms.MessageContent{
		llms.TextParts(llms.ChatMessageTypeSystem, "You are an expert in extracting structured data from unstructured text. You must only respond in JSON format that MUST adhere to the following JSON schema:\n\n"+string(b)+"\n\n. Do NOT invent the data that's missing, simply omit the missing data. Do NOT addy any additional JSON fields that are not present in the given schema. Make sure you return a valid instance of the JSON object, NOT the schema itself or any part of it!"),
		llms.TextParts(llms.ChatMessageTypeHuman, "Here is the text: "+input),
	}

	ctx := context.Background()
	completion, err := llm.GenerateContent(ctx, content, llms.WithStreamingFunc(func(ctx context.Context, chunk []byte) error {
		fmt.Print(string("LLM response:", chunk))
		return nil
	}))
	if err != nil {
		log.Fatal(err)
	}
	resp := completion.Choices[0].Content
	if len(resp) == 0 {
		log.Fatal("no content received")
	}
	var player Player
	if err := json.Unmarshal([]byte(resp), &player); err != nil {
		log.Fatalf("failed decoding response to JSON: %v", err)
	}
	fmt.Println("----")
	fmt.Println("Got player: ", player)

	playerSchema := cctx.CompileString(schemaFile).LookupPath(cue.ParsePath("#player"))
	cuePlayer := cctx.Encode(player)

	v := playerSchema.Unify(cuePlayer)
	if err := v.Validate(); err != nil {
	  // TODO: do something here instead of exiting
		log.Fatalf("❌ Player invalid: %v", err)
	}
	fmt.Println("✅ Player valid")
}

If you run it, you should see the following output:


JSON SCHEMA:
{"openapi":"3.0.0","info":{"title":"Football Players","version":"v1"},"paths":{},"components":{"schemas":{"player":{"description":"A Player","type":"object","properties":{"name":{"description":"The player's name","type":"string"},"age":{"description":"The player's age","type":"integer","maximum":100}}}}}}

Enter a line of text:
Palmer opened the scoring in west London, thus becoming only the third player in Premier League history to reach 30-plus goal involvements in a single season. He is only 22 years old.

LLM response: {"name":"Palmer","age":22}
Got player:  {Palmer 22}
✅ Player valid

And there we have it: we’ve extracted the Name and the Age data from unstructured text using llama3 model!

Now if you try changing the constraint in the CUE schema to something like this (note the age constraint has been changed to be bigger than or equal to 100):

package llm

// A Player
#Player: {
	// The player's name
	name?: string
	// The player's age
	age?: int & >=100
}

if you now run the program again you’ll get the following output:

JSON SCHEMA:
{"openapi":"3.0.0","info":{"title":"Football Players","version":"v1"},"paths":{},"components":{"schemas":{"player":{"description":"A Player","type":"object","properties":{"name":{"description":"The player's name","type":"string"},"age":{"description":"The player's age","type":"integer","minimum":100}}}}}}

Enter a line of text:
Palmer opened the scoring in west London, thus becoming only the third player in Premier League history to reach 30-plus goal involvements in a single season. He is only 22 years old.

{"name":"Palmer","age":22}
Got player:  {Palmer 22}
main.go:92: ❌ Player invalid: #player.age: invalid value 22 (out of bound >=100)
exit status 1

Awesome, validation works like a charm!!

Now, before we conclude this post, I promised I’d show you an example of how we can get around the fact that the OpenAPI schema generation doesn’t seem to work as I’d expect when we try leveraging the cue struct tags instead of specifying the CUE schema via a file.

The solution is to use another Go module that lets you specify your OpenAPI definitions via struct tags. This can get a bit tedious if your definitions are complex, but for our use case (LLM prompting) we don’t need much cruft in the tags — we basically just need some way to nudge the LLM in the right direction about what data each field should contain.

We will use the jsonschema Go module to generate the OpenAPI schema which we pass to the LLM. We will define the CUE validation constraints via the cue tags. The nice thing about this is that the code gets slightly simpler overall, though the struct tags span the whole screen and some! Here’s the full code.

package main

import (
	"bufio"
	"context"
	_ "embed"
	"encoding/json"
	"fmt"
	"log"
	"os"

	"cuelang.org/go/cuego"
	"github.com/invopop/jsonschema"
	"github.com/tmc/langchaingo/llms"
	"github.com/tmc/langchaingo/llms/ollama"
)

type Player struct {
	Name string `json:"name,omitempty" jsonschema:"description=The player's name"`
	Age  int    `cue:"<=100" json:"age,omitempty" jsonschema:"description="The player's age"`
}

func main() {
	llm, err := ollama.New(ollama.WithModel("llama3:latest"))
	if err != nil {
		log.Fatal(err)
	}
	s := jsonschema.Reflect(&Player{})
	b, err := json.MarshalIndent(s, "", "  ")
	if err != nil {
		panic(err.Error())
	}

	fmt.Printf("JSON SCHEMA:\n%s\n\n", b)

	scanner := bufio.NewScanner(os.Stdin)
	fmt.Println("Enter a line of text:")
	scanner.Scan()
	input := scanner.Text()
	fmt.Println()

	content := []llms.MessageContent{
		llms.TextParts(llms.ChatMessageTypeSystem, "You are an expert in extracting structured data from unstructured text. You must only respond in JSON format that MUST adhere to the following JSON schema:\n\n"+string(b)+"\n\n. Do NOT invent the data that's missing, simply omit the missing data. Do NOT addy any additional JSON fields that are not present in the given schema. Make sure you return a valid instance of the JSON object, NOT the schema itself or any part of it!"),
		llms.TextParts(llms.ChatMessageTypeHuman, "Here is the text: "+input),
	}

	ctx := context.Background()
	completion, err := llm.GenerateContent(ctx, content, llms.WithStreamingFunc(func(ctx context.Context, chunk []byte) error {
		fmt.Print(string(chunk))
		return nil
	}))
	if err != nil {
		log.Fatal(err)
	}
	resp := completion.Choices[0].Content
	if len(resp) == 0 {
		log.Fatal("no content received")
	}
	var player Player
	if err := json.Unmarshal([]byte(resp), &player); err != nil {
		log.Fatalf("failed decoding response to JSON: %v", err)
	}
	fmt.Println()
	fmt.Println("Got player: ", player)

	if err := cuego.Validate(&player); err != nil {
		log.Fatalf("❌ Player invalid: %v", err)
	}
	fmt.Println("✅ Player valid")
}

There are a couple of things to notice other than the cue struct tags specifying the constraint for age values.

  1. Notice that we no longer use CUE Go libraries to generate the OpenAPI schema; instead, we defer that to the jsonschema Go module
  2. Notice how we now use the earlier discussed cuego package to validate the data we’ve received from the LLM
  3. The OpenAPI schema that is generated by jsonschema is slightly different from the one generated by CUE libraries but in the grand scheme of things that’s not an issue

Here’s the output this program produces:

go run ./...
JSON SCHEMA:
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$ref": "#/$defs/Player",
  "$defs": {
    "Player": {
      "properties": {
        "name": {
          "type": "string",
          "description": "The player's name"
        },
        "age": {
          "type": "integer"
        }
      },
      "additionalProperties": false,
      "type": "object",
      "required": [
        "name",
        "age"
      ]
    }
  }
}

Enter a line of text:
Palmer opened the scoring in west London, thus becoming only the third player in Premier League history to reach 30-plus goal involvements in a single season. He is only 22 years old.

{"name": "Palmer", "age": 22}
Got player:  {Palmer 22}
✅ Player valid

And there you have it. We’ve successfully extracted structured data from unstructured text again by taking a slightly different approach and avoiding the need to maintain CUE schema files!

Conclusion

This blog post was borne out of a silly late-night conversation I had with one of my friends. I almost always end up nerd-sniping myself in these conversations. And it somehow almost always happens around midnight. There is rarely any way out of it other than building some prototype to verify my intuitions and theories.

I hope you learnt some new tricks by reading it — I certainly did by writing it. And if you haven’t then thanks for reading it all the way!

Feel free to leave a comment or drop me an email! Now, go extract and validate data produced by LLMs with Go and CUE! Until next time!

go  golang  llm  ai  cue  cuelang 

See also