tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > Hugging Face > Named Entity Recognition with Hugging Face Transformers

Named Entity Recognition with Hugging Face Transformers

Author: Venkata Sudhakar

Named Entity Recognition (NER) identifies and classifies named entities in text such as product names, brand names, locations, and organisations. ShopMax India uses NER to automatically extract product mentions, city names, and brand references from customer support tickets and order notes, enabling faster routing and cataloguing without manual tagging.

Hugging Face provides NER pipelines using token classification models fine-tuned on datasets like CoNLL-2003. The ner pipeline returns a list of entities with their type (PER, ORG, LOC, MISC), character positions, and confidence scores. Setting aggregation_strategy='simple' merges sub-word tokens into complete entity spans for cleaner output.

The example below extracts named entities from ShopMax India customer support messages, identifying product names, locations, and organisation references automatically.


It gives the following output,

Ticket: My Sony Bravia TV ordered from ShopMax India arrived damaged at my Mumbai flat.
  [ORG] Sony (score: 0.99)
  [MISC] Bravia (score: 0.87)
  [ORG] ShopMax India (score: 0.98)
  [LOC] Mumbai (score: 0.99)

Ticket: The OnePlus Nord I bought for my brother in Bangalore has a battery issue.
  [ORG] OnePlus (score: 0.97)
  [MISC] Nord (score: 0.82)
  [LOC] Bangalore (score: 0.99)

Ticket: Samsung Galaxy S24 delivery to Hyderabad is delayed by three days.
  [ORG] Samsung (score: 0.99)
  [MISC] Galaxy S24 (score: 0.91)
  [LOC] Hyderabad (score: 0.99)

For ShopMax India's product catalogue use case, consider fine-tuning on domain-specific data that includes electronics product names and Indian city names not well-represented in CoNLL-2003. Use grouped_entities=True in older pipeline versions. In production, run NER offline in batch mode on new tickets every few minutes rather than per-request to reduce latency and GPU costs.


 
  


  
bl  br