Zing Forum

Reading

Plagiarism Detection System: A Text Plagiarism Checking Solution with Hybrid .NET and Python Architecture

The Plagiarism Detection System is a cross-technical stack text similarity detection system that combines a .NET Core MVC web application with a Python machine learning module, providing an easy-to-use web interface and intelligent plagiarism detection capabilities.

抄袭检测文本相似度.NET CorePython机器学习查重系统学术诚信
Published 2026-05-10 12:56Recent activity 2026-05-10 13:01Estimated read 6 min
Plagiarism Detection System: A Text Plagiarism Checking Solution with Hybrid .NET and Python Architecture
1

Section 01

Plagiarism Detection System: Core Overview

Core Introduction

Plagiarism Detection System is a cross-technical stack text similarity detection solution combining .NET Core MVC Web application and Python machine learning module. It provides an easy-to-use web interface and intelligent plagiarism detection capabilities, aiming to balance usability for non-technical users and extensibility for developers.

2

Section 02

Project Background & Motivation

Background

Against the backdrop of growing emphasis on academic integrity and content originality, plagiarism detection has become an essential need for educational institutions, the publishing industry, and enterprises. Traditional methods relying on simple string matching fail to identify advanced plagiarism like paraphrasing or synonym replacement, driving the need for a more intelligent solution.

3

Section 03

Hybrid Technical Architecture

Architecture Details

The system adopts a dual-stack design:

  1. .NET Core MVC: Frontend and business layer, offering modern web interface (file upload/paste), user session management, file storage, result visualization, report generation, and cross-platform support.
  2. Python ML Module: Analysis engine, responsible for text preprocessing/cleaning, multiple similarity algorithms (cosine similarity, Jaccard coefficient, semantic vectors), ML-based paraphrasing recognition, and comprehensive similarity scoring with match location.
4

Section 04

Key Functional Features

Core Features

  • Intelligent Text Comparison: Detects identical text, synonym rewrites, word order adjustments, paragraph restructuring, and future cross-language similarity.
  • Multi-format Support: Natively supports txt, rtf, doc/docx, pdf; allows text paste for unsupported formats.
  • Visual Reports: Presents overall similarity score (percentage), highlighted match areas, side-by-side comparison, and exportable PDF reports.
5

Section 05

Algorithm Principles

Algorithm Workflow

Preprocessing

Text extraction → standardization (encoding, format removal, punctuation normalization) → tokenization/stemming → stopword filtering.

Similarity Calculation

Layers: fast fingerprint matching (SimHash/MinHash), N-gram overlap, TF-IDF cosine similarity, semantic embedding (BERT-like models).

Result Aggregation

Weighted fusion of multi-layer results to generate final score and locate matching paragraphs.

6

Section 06

Application Scenarios

Use Cases

  • Education: Student homework plagiarism checking, exam answer comparison, thesis review.
  • Publishing & Media: Manuscript originality check, internal content deduplication, translation quality inspection.
  • Enterprise: Contract document comparison, code plagiarism detection, knowledge base deduplication.
7

Section 07

Limitations & Improvement Directions

Current Limitations & Future Plans

Limitations

  • Offline dependency (needs full app download, no pure browser run).
  • Limited Chinese support (optimized for English).
  • Unclear integration with public internet databases.
  • Possible ML model false positives.

Improvements

  • Develop pure web version.
  • Add multi-language NLP support.
  • Integrate public document databases.
  • Introduce manual review mechanism.
8

Section 08

Conclusion & Evaluation

Summary

Plagiarism Detection System is a well-designed practical tool. Its hybrid .NET-Python architecture balances development efficiency and detection accuracy, setting a reference paradigm for similar projects. It suits small/medium institutions (quick deployment) and large organizations (customization via secondary development).