Welcome

To get started:

Visit: console.neo4j.io and create an account
Create a free instance:
Visit llm-graph-builder.neo4jlabs.com
Connect to your instance: (Use the credentials file downloaded on instance creation)

Slides: dh25.resolve.works

Finding Connections

Transform your document collection into a graph visualisation

Who am I?

Freelance IT generalist
Over 15 years of building software
Work in journalism for the last 4 years

What we'll cover today

What GraphRAG is, and how it works
How to get your documents ready
Extracting a graph from your documents

So, what is GraphRAG?

A way of getting better answers from a language model by using a knowledge graph

Graph + RAG

What is a graph?

And, what is RAG?

So first things first...

What is a graph?

Nodes and edges.

Or: Entities and their relations.

But what is RAG?

R(etrieval) A(ugmented) G(eneration)

Help the language model with relevant information

1. Find relevant information that can be used to answer your question

2. Provide the language model with the retrieved information alongside the question.

3. Get better answers.

Sounds familiar?

So, what is relevant?

What information do you give to the model?

Deep Research searches the web

But GraphRAG makes use of "semantic search"

Semantic search is a way of finding semantically similar information

It works by converting texts into numerical representations that can be compared, usually Vectors

These vectors are called embeddings

Because we're embedding text into multidimensional space

Don't worry, it's simpler than it sounds

A vector is a direction in multidimensional space

In other words, a list of numbers

This conversion is done with "embedding models" that are trained specifically for this task.

Let's try with all-MiniLM-L6-v2


from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

sentence = "The sun is shining"
embedding = model.encode(sentence)

"The sun is shining" becomes...

 5.42898139e-04  1.02551401e-01  8.36703405e-02  9.39253941e-02
 2.07145736e-02 -1.12772360e-02  9.93586630e-02 -4.44610193e-02
 3.98165248e-02 -2.13668533e-02  2.09684577e-02 -3.66757251e-02
 7.11042061e-03  4.15845886e-02  1.01847239e-01  7.34366402e-02
 1.43970475e-02  4.65401914e-03  9.70691442e-03 -4.16579135e-02
-2.30006799e-02 -2.27442160e-02 -2.73266062e-02 -1.22460043e-02
-8.25940259e-03  6.72213510e-02 -2.77701933e-02  2.63383351e-02
-6.12662174e-02 -1.37788551e-02  2.81139649e-02  5.81778679e-03
... etc ... ...

384 dimensions

Great. But now that we have our numerical representation, how do we compare it to others?

Embedding models are trained so that vectors with similar direction and magnitude are semantically similar

We can find similar vectors by calculating Cosine Similarity

Let's see what that looks like

384 dimensions is hard to visualize though...

So let's imagine we just have 2 dimensions

So to come back to "The sun is shining"...

Text	Similarity
Bring an umbrella	0.320
The weather is great	0.418
It's daytime	0.622

Congratulations, you made it through the theory! 🎉

GraphRAG

How are these these technologies combined?

First the dataset is preprocessed...

Entities and relations are extracted from the text

Embedded for semantic search

Added to the graph

After which communities are calculated

Then, when querying the data...

We find similar entities through semantic search

Then we fetch related entities and communities

And the texts that reference them

Send all that together with the question to the model

---Role---
You are a helpful assistant responding to questions about data in the tables provided.

---Goal---
Generate a response that responds to the user's question, summarizing all information in the input data tables, and incorporating any relevant general knowledge.
If you don't know the answer, just say so. Do not make anything up.
Do not include information where the supporting evidence for it is not provided.

---Data tables---
{context_data}

And get better answers based on your data

But why is this cool?

Because we can visualize the graph

GraphRAG tools

github.com/microsoft/graphrag
github.com/noworneverev/graphrag-visualizer Widely adopted, popular
github.com/gusye1234/nano-graphrag
Small, easy to hack on
llm-graph-builder.neo4jlabs.com
github.com/neo4j/neo4j-graphrag-python Neo4j ecosystem

Parsing documents

Language model ❤️ markdown

github.com/docling-project/docling
IBM Research
github.com/microsoft/markitdown
Microsoft

To get started:

Visit: console.neo4j.io and create an account
Create a free instance:
Visit llm-graph-builder.neo4jlabs.com
Connect to your instance: (Use the credentials file downloaded on instance creation)

Some example documents
From Follow the Money's latest Sainsbury investigation, Thanks Lise!

LinkedIn:

Slides:

Thank you!