NEWS BRIEF

In this project, we implemented a system for Information Extraction (IE) from raw news articles. The system automatically extracts structured data such as entities, relationships, events, and metadata from unstructured text, making it easier for businesses to analyze and utilize news content.

Objective

The goal of this project is to extract key pieces of information from news articles, such as:

  • Translation Translate the text into Chinese
  • Entities (e.g., persons, organizations, locations)
  • Events mentioned in the article
  • Sentiments from various perspectives
  • Metadata such as news source, category, and credibility ratings

This data is utilized for:

  • Automated content analysis
  • Market research
  • Media monitoring
  • News aggregation

Tools & Technologies Used

  • Python: The primary programming language for text processing.
  • Natural Language Processing (NLP) Libraries:
    • spaCy: For Named Entity Recognition (NER), dependency parsing, and relation extraction.
    • OpenAi: For translation, extracting sentiments and collecting metadata 

 

Key Features

  1. Entity Extraction:
    • Identify and extract entities such as people, organizations, locations, and dates.
  2. Language Translation:
    • Translate the title and description to Chinese and in any other language that client requires.
  3. Event Extraction:
    • Detect significant events, such as demands for extradition or political developments.
  1. Sentiment Extraction:
    • Extract sentiment from different comments, categorizing them into neutral, left-wing, and right-wing perspectives.
  2. Metadata Extraction:

Extract additional metadata such as the news source, category, importance rating, and timeliness rating.