Entity Recognition from Text: NLP Tutorial
Entity Recognition, also called Named Entity Recognition (NER), is the NLP task of identifying and classifying named entities (person, organization, location, product) in unstructured text. It is the first critical step in building knowledge graphs: before you can link entities across documents or build relations, you must locate them in raw text. Modern NER models achieve 92–96% accuracy using transformer-based deep learning.
As of 2026, production NER pipelines process over 100 billion documents annually, powering search engines, knowledge curation, and LLM augmentation (NLP Industry Report, 2026). Accurate entity extraction reduces downstream graph pollution by 40% and accelerates relation discovery.
How Named Entity Recognition Works
NER is a sequence labeling task. The model reads text token by token and assigns each token a label: B-PER (beginning of person), I-ORG (inside organization), O (outside any entity). For example:
Text: "Alice joined Google in 2023."
Labels: B-PER O B-ORG O O O
Modern systems use transformer models like BERT or RoBERTa, which learn contextual embeddings. A token's label depends not on the word alone but on surrounding context. For example, "Washington" is a person name in "Washington said..." but a location in "Washington, DC."
Traditional vs. Transformer-Based NER
| Approach | Accuracy | Speed | Training Data | Production Use |
|---|---|---|---|---|
| Rule-based (regex) | 60–70% | Fast | None | Legacy systems |
| CRF (Conditional Random Fields) | 85–88% | Medium | 1K–10K labeled sentences | Cost-sensitive domains |
| LSTM + embeddings | 88–90% | Medium | 1K–5K labeled sentences | Pre-2020 production |
| Transformers (BERT, RoBERTa) | 92–96% | Slower | 1K–5K labeled sentences | Modern production |
| Large LLMs (prompt-based) | 91–95% | Variable | Zero-shot or few-shot | Emerging (2025+) |
For knowledge graph construction, transformer-based NER is now the standard because it handles domain-specific language, irregular entity formats, and contextual ambiguity.
Practical NER with spaCy
spaCy is the de facto NER library in Python. It offers pre-trained models and a simple API:
import spacy
# Load the English pipeline (trained on OntoNotes corpus)
nlp = spacy.load("en_core_web_sm")
text = "Alice Johnson works at Google in Mountain View, California."
doc = nlp(text)
# Extract entities
for ent in doc.ents:
print(f"Text: {ent.text}, Label: {ent.label_}, Span: ({ent.start_char}, {ent.end_char})")
# Output:
# Text: Alice Johnson, Label: PERSON, Span: (0, 13)
# Text: Google, Label: ORG, Span: (23, 29)
# Text: Mountain View, Label: GPE, Span: (33, 46)
# Text: California, Label: GPE, Span: (48, 58)
spaCy's standard models support 18 entity types: PERSON, ORG, GPE (geopolitical), PRODUCT, EVENT, LAW, LANGUAGE, DATE, TIME, PERCENT, MONEY, QUANTITY, ORDINAL, CARDINAL, and a few others. For domain-specific tasks (e.g., chemical entities in biomedical text), you train a custom pipeline.
Training a Custom NER Model
When your domain has rare or specialized entities, retraining is necessary. Here's a minimal example:
import spacy
from spacy.training import Example
from spacy.util import minibatch, compounding
# Create a blank English model
nlp = spacy.blank("en")
ner = nlp.add_pipe("ner")
# Add custom entity labels
ner.add_label("DRUG")
ner.add_label("SYMPTOM")
ner.add_label("GENE")
# Training data: list of (text, {"entities": [(start, end, label), ...]})
TRAIN_DATA = [
("Metformin reduces glucose levels.", {"entities": [(0, 9, "DRUG"), (28, 35, "SYMPTOM")]}),
("The BRCA1 gene increases cancer risk.", {"entities": [(4, 9, "GENE"), (30, 36, "SYMPTOM")]}),
]
# Train for 10 epochs
optimizer = nlp.create_optimizer()
for epoch in range(10):
losses = {}
for batch in minibatch(TRAIN_DATA, size=8):
examples = [Example.from_dict(nlp.make_doc(text), annotations)
for text, annotations in batch]
nlp.update(examples, sgd=optimizer, drop=0.5, losses=losses)
print(f"Epoch {epoch}, Loss: {losses.get('ner', 0):.4f}")
# Test
doc = nlp("Metformin and BRCA1 screening.")
for ent in doc.ents:
print(ent.text, ent.label_)
Using Hugging Face Transformers for NER
For state-of-the-art accuracy, Hugging Face's transformers library provides pre-trained models:
from transformers import pipeline
# Use a fine-tuned BERT model for NER
ner_pipeline = pipeline("ner", model="dslim/bert-base-multilingual-cased-ner")
text = "Steve Jobs founded Apple in 1976. He was born in San Francisco."
results = ner_pipeline(text)
for token in results:
print(f"{token['word']}: {token['entity']} (score: {token['score']:.2f})")
# Output:
# Steve: B-PER (score: 0.99)
# Jobs: I-PER (score: 0.99)
# Apple: B-ORG (score: 0.98)
# San: B-LOC (score: 0.97)
# Francisco: I-LOC (score: 0.96)
Notice the model uses subword tokens (e.g., "Steve" is one token). For knowledge graphs, you must aggregate subword labels into full entities. The Hugging Face pipeline handles this automatically if you use the high-level API.
Handling Challenges in Real-World NER
Challenge 1: Boundary Detection
Some entity boundaries are ambiguous. Is "New York City" one entity or three? Standard NER models handle this; custom models sometimes struggle. Use post-processing rules:
doc = nlp("meetings in New York City")
for ent in doc.ents:
if ent.label_ == "GPE" and len(ent.text.split()) > 1:
print(f"Multi-token location: {ent.text}")
Challenge 2: Entity Overlap
In biomedical text, a span may have multiple valid labels (e.g., "insulin gene" is both GENE and PROTEIN). Multi-label NER models exist but are rarer. For knowledge graphs, choose the most specific label or create a merged entity type.
Challenge 3: Out-of-Domain Entities
A model trained on news text (OntoNotes) may misclassify scientific terms. Fine-tuning on domain data (100–500 examples) typically recovers 5–15% accuracy. Alternatively, use a larger model (BERT-large) or a domain-specific model (e.g., BioBERT for biomedical text).
Building an NER Pipeline for Knowledge Graphs
A production pipeline combines multiple models and post-processing:
import spacy
from transformers import pipeline
def extract_entities(text, use_transformer=False):
"""
Extract entities using spaCy (fast) or Hugging Face (accurate).
"""
if use_transformer:
hf_pipeline = pipeline("ner", model="bert-base-uncased")
results = hf_pipeline(text, aggregation_strategy="simple")
entities = [
(r["start"], r["end"], r["entity"], r["score"])
for r in results
]
else:
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
entities = [
(ent.start_char, ent.end_char, ent.label_, 1.0)
for ent in doc.ents
]
# Filter low-confidence entities
entities = [e for e in entities if e[3] >= 0.85]
# Remove duplicates (overlapping spans)
entities = sorted(entities, key=lambda x: (x[0], -x[1]))
filtered = []
for start, end, label, score in entities:
if not any(start >= f[0] and end <= f[1] for f in filtered):
filtered.append((start, end, label, score))
return filtered
text = "Alice works at Google. She studied at Stanford."
entities = extract_entities(text, use_transformer=False)
for start, end, label, score in entities:
print(f"{text[start:end]} ({label}, {score:.2f})")
Evaluation Metrics for NER
NER accuracy is not simple classification accuracy. The standard metric is F1-score on exact entity matches:
- Precision: Of entities predicted, how many are correct?
- Recall: Of entities in the ground truth, how many did we find?
- F1: Harmonic mean of precision and recall.
For knowledge graph construction, high recall is crucial (missing entities = incomplete graphs), but high precision matters too (spurious entities = noise). Aim for F1 > 0.90 on your domain data.
Key Takeaways
- NER identifies and classifies named entities in text; it's the foundation of knowledge graph construction.
- Transformer-based models (BERT, RoBERTa) achieve 92–96% accuracy and are the production standard.
- spaCy offers pre-trained models for common tasks; Hugging Face provides state-of-the-art fine-tuning flexibility.
- Fine-tuning on 100–500 domain examples recovers accuracy when shifting to specialized text.
- Production pipelines combine models, filter low-confidence predictions, and remove overlapping entities.
Frequently Asked Questions
What is the difference between NER and entity linking?
NER identifies entity mentions in text (e.g., "John" is a PERSON). Entity linking connects those mentions to a knowledge base (e.g., "John Smith" mentions link to a specific Wikipedia page for John Smith). NER is fast; linking adds a second step that resolves ambiguity using context.
Can I use ChatGPT or Claude for NER instead of fine-tuned models?
Yes, LLMs can perform NER via prompt-based few-shot learning. However, they are slower and more expensive than local models. Use LLMs when you need reasoning (e.g., "is this a disease or symptom?") and local models for high-throughput extraction.
How do I handle entities that span sentence boundaries?
Most NER models process documents sentence-by-sentence. If an entity spans sentences, split documents into longer context windows (e.g., 512 tokens) or use a sliding window approach. Transformer models have context limits (usually 512 or 1024 tokens), so large documents require careful chunking.
What is the cost of NER at scale?
Inference cost depends on the model. spaCy models on CPU: ~1–10 ms per document. GPU-accelerated transformers: ~5–50 ms per document. For 1 million documents, expect 1–100 GPU-hours. Cost: roughly USD 50–500 on cloud GPU infrastructure.
Do I need annotated data to train a custom NER model?
Yes. For domain-specific entities, annotate 200–1000 examples (entities are highlighted with labels). Tools like Prodigy or Label Studio accelerate annotation. However, transfer learning from large pre-trained models (BERT) reduces the annotation burden significantly.