Skip to main content

Knowledge Graph Query Languages Tutorial

Knowledge graph query languages enable structured retrieval of facts. SPARQL is the W3C standard for RDF graphs; Cypher is Neo4j's proprietary language for property graphs. Mastering query languages is essential: LLMs augmented with structured retrieval answer 34% more complex questions than those using text search alone (KGQA Benchmark, 2026).

This article teaches both SPARQL and Cypher through real examples, enabling you to write queries that ground LLM reasoning.

SPARQL: Query Language for RDF Graphs

SPARQL (SPARQL Protocol and RDF Query Language) is the standard for querying RDF (Resource Description Format) triple stores. An RDF triple is: (subject, predicate, object), e.g., (Alice, works_for, Google).

Basic SPARQL Query Structure

A simple SPARQL query has four parts:

PREFIX ex: <http://example.org/>

SELECT ?person ?company
WHERE {
?person ex:works_for ?company .
?company ex:industry "Technology" .
}
LIMIT 10

Breaking it down:

  • PREFIX: Define shorthand namespaces.
  • SELECT: Variables to return (start with ?).
  • WHERE: Triple patterns to match.
  • LIMIT: Maximum results.

Examples: Single-Hop and Multi-Hop Queries

Single-hop (one relationship):

PREFIX ex: <http://example.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?company
WHERE {
?person foaf:name "Alice Johnson" .
?person ex:works_for ?company .
}

Multi-hop (two relationships):

PREFIX ex: <http://example.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?company ?city
WHERE {
?person foaf:name "Alice Johnson" . # Step 1: Find Alice
?person ex:works_for ?company . # Step 2: Find her company
?company ex:located_in ?city . # Step 3: Find company location
}

Complex query with filters:

PREFIX ex: <http://example.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?person ?salary
WHERE {
?person ex:works_for ex:Google .
?person ex:salary ?salary .
?person ex:hire_year ?year .
FILTER (?year > 2020 && ?salary > 150000)
}
ORDER BY DESC(?salary)

SPARQL: Optional Matches and Aggregation

Optional relationships:

PREFIX ex: <http://example.org/>

SELECT ?person ?manager
WHERE {
?person ex:works_for ex:Google .
OPTIONAL { ?person ex:reports_to ?manager . }
}

Aggregation (count, sum, avg):

PREFIX ex: <http://example.org/>

SELECT ?company (COUNT(?person) AS ?employee_count)
WHERE {
?person ex:works_for ?company .
}
GROUP BY ?company

Cypher: Query Language for Neo4j Property Graphs

Cypher is Neo4j's intuitive graph query language. It represents patterns visually: (node)-[relationship]->(other).

Basic Cypher Query Structure

MATCH (p:Person {name: "Alice Johnson"})
RETURN p

Parts:

  • MATCH: Pattern to find in the graph.
  • RETURN: What to return.
  • WHERE: Additional filters (optional).

Cypher Examples

Single-hop:

MATCH (person:Person {name: "Alice Johnson"})-[:WORKS_FOR]->(company:Company)
RETURN person.name, company.name

Multi-hop:

MATCH (person:Person {name: "Alice Johnson"})
-[:WORKS_FOR]->(company:Company)
-[:LOCATED_IN]->(city:City)
RETURN person.name, company.name, city.name

With filters:

MATCH (person:Person)-[:WORKS_FOR]->(company:Company {name: "Google"})
WHERE person.hire_year > 2020 AND person.salary > 150000
RETURN person.name, person.salary
ORDER BY person.salary DESC

Optional relationships:

MATCH (person:Person)-[:WORKS_FOR]->(company:Company)
OPTIONAL MATCH (person)-[:REPORTS_TO]->(manager:Person)
RETURN person.name, manager.name

Aggregation:

MATCH (person:Person)-[:WORKS_FOR]->(company:Company)
RETURN company.name, COUNT(person) AS employee_count
GROUP BY company.name
ORDER BY employee_count DESC

Comparison: SPARQL vs. Cypher

AspectSPARQLCypher
StandardW3C standardNeo4j proprietary
Data modelRDF triplesProperty graphs
SyntaxSQL-likeAscii-art patterns
ReadabilityVerbose for complex queriesIntuitive visual patterns
PerformanceVaries by databaseOptimized for Neo4j
PortabilityRuns on any RDF databaseSpecific to Neo4j

Practical: Querying Knowledge Graphs from Python

Querying RDF with rdflib

from rdflib import Graph, Namespace, RDF

# Load or create an RDF graph
g = Graph()
EX = Namespace("http://example.org/")

# Add some triples
alice = EX.Alice
google = EX.Google
g.add((alice, RDF.type, EX.Person))
g.add((alice, EX.name, "Alice Johnson"))
g.add((alice, EX.works_for, google))
g.add((google, EX.name, "Google"))

# Simple SPARQL query
query = """
PREFIX ex: <http://example.org/>
SELECT ?person ?company
WHERE {
?person ex:works_for ?company .
}
"""

results = g.query(query)
for row in results:
print(f"{row.person} works for {row.company}")

Querying Neo4j with Python

from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost:7687",
auth=("neo4j", "password"))

def run_query(query: str, params: dict = None) -> list:
with driver.session() as session:
result = session.run(query, params or {})
return [dict(record) for record in result]

# Cypher query
query = """
MATCH (person:Person)-[:WORKS_FOR]->(company:Company)
WHERE company.name = $company_name
RETURN person.name, person.salary
ORDER BY person.salary DESC
"""

results = run_query(query, {"company_name": "Google"})
for record in results:
print(f"{record['person.name']}: ${record['person.salary']}")

driver.close()

LLM-Enabled Query Generation

LLMs can translate natural language to graph queries. This is called semantic parsing or text-to-graph:

from anthropic import Anthropic

client = Anthropic()

system_prompt = """You are an expert in converting natural language questions to Cypher queries.
Given a question, return ONLY the Cypher query, no explanation.

Schema:
- (:Person {name, hire_year, salary})
- (:Company {name, industry})
- (:City {name})
- (Person)-[:WORKS_FOR]->(Company)
- (Person)-[:REPORTS_TO]->(Person)
- (Company)-[:LOCATED_IN]->(City)
"""

def question_to_cypher(question: str) -> str:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=500,
system=system_prompt,
messages=[
{"role": "user", "content": question}
]
)
return response.content[0].text

# Example
question = "Who are the highest-paid employees at Google?"
cypher = question_to_cypher(question)
print(f"Generated query:\n{cypher}")

# Output:
# MATCH (p:Person)-[:WORKS_FOR]->(c:Company {name: "Google"})
# RETURN p.name, p.salary
# ORDER BY p.salary DESC

Advanced Query Patterns

Finding Shortest Paths

MATCH path = shortestPath(
(alice:Person {name: "Alice Johnson"})-[*]-(bob:Person {name: "Bob Chen"})
)
RETURN path

Graph Pattern Detection

Find companies where all employees earn over 100k:

MATCH (c:Company)
WHERE NOT EXISTS (
(person:Person)-[:WORKS_FOR]->(c)
WHERE person.salary <= 100000
)
RETURN c.name

Recursive Relationships (Management Chains)

Find the entire reporting chain for a person:

MATCH (person:Person {name: "Alice Johnson"})
OPTIONAL MATCH (person)-[:REPORTS_TO*0..]-(ceo:Person)
RETURN person.name, ceo.name

Key Takeaways

  • SPARQL queries RDF graphs using triple patterns; Cypher queries Neo4j property graphs using intuitive ascii-art syntax.
  • Multi-hop queries enable reasoning that single-document retrieval cannot achieve.
  • Filters, aggregations, and optional matches provide expressive power for complex information needs.
  • LLMs can learn to generate queries from questions, enabling natural language interfaces to knowledge graphs.
  • Query performance depends on indexing and graph structure; profile and optimize slow queries.

Frequently Asked Questions

What's the query performance of knowledge graphs vs. vector databases?

Knowledge graph queries are deterministic: same query, same result. Latency ranges from 10 ms to several seconds depending on graph size and query complexity. Vector databases are approximate (range queries), typically 50–200 ms. For exact retrieval, graphs are faster; for fuzzy matching, vectors are faster.

How do I debug slow Cypher queries?

Use PROFILE or EXPLAIN to see the query plan: PROFILE MATCH (n:Person) RETURN n. Indexes are crucial: create indexes on frequently filtered properties. Neo4j's Query Tuning guide provides detailed optimization strategies.

Can I combine SPARQL and Cypher in one query?

No, they're separate languages. However, you can query both systems from Python and merge results programmatically. Some tools like GraphQL provide a unified query layer over multiple backends.

How do I translate user questions into graph queries reliably?

Use few-shot prompting: provide 3–5 examples of questions and their corresponding queries in your system prompt. For high-stakes applications, validate LLM-generated queries by checking syntax and executing them in a sandbox before hitting the live graph.

What's the maximum query complexity a graph can handle?

A single query can traverse 50+ hops, but performance degrades rapidly. Most production queries are 3–5 hops. For deeper reasoning, decompose into multiple queries and combine results in your application logic.

Further Reading