Zing 论坛

正文

Plagiarism Detection System:.NET 与 Python 混合架构的文本查重解决方案

Plagiarism Detection System 是一个跨技术栈的文本相似度检测系统,结合 .NET Core MVC Web 应用与 Python 机器学习模块,提供易用的网页界面和智能的抄袭检测能力。

抄袭检测文本相似度.NET CorePython机器学习查重系统学术诚信
发布时间 2026/05/10 12:56最近活动 2026/05/10 13:01预计阅读 6 分钟
Plagiarism Detection System:.NET 与 Python 混合架构的文本查重解决方案
1

章节 01

Plagiarism Detection System: Core Overview

Core Introduction

Plagiarism Detection System is a cross-technical stack text similarity detection solution combining .NET Core MVC Web application and Python machine learning module. It provides an easy-to-use web interface and intelligent plagiarism detection capabilities, aiming to balance usability for non-technical users and extensibility for developers.

2

章节 02

Project Background & Motivation

Background

Against the backdrop of growing emphasis on academic integrity and content originality, plagiarism detection has become a刚需 for educational institutions, publishing industry, and enterprises. Traditional methods relying on simple string matching fail to identify advanced plagiarism like paraphrasing or synonym replacement, driving the need for a more intelligent solution.

3

章节 03

Hybrid Technical Architecture

Architecture Details

The system adopts a dual-stack design:

  1. .NET Core MVC: Frontend and business layer, offering modern web interface (file upload/paste), user session management, file storage, result visualization, report generation, and cross-platform support.
  2. Python ML Module: Analysis engine, responsible for text preprocessing/cleaning, multiple similarity algorithms (cosine similarity, Jaccard coefficient, semantic vectors), ML-based paraphrasing recognition, and comprehensive similarity scoring with match location.
4

章节 04

Key Functional Features

Core Features

  • Intelligent Text Comparison: Detects identical text, synonym rewrites,语序 adjustments, paragraph重组, and future cross-language similarity.
  • Multi-format Support: Natively supports txt, rtf, doc/docx, pdf; allows text paste for unsupported formats.
  • Visual Reports: Presents overall similarity score (percentage), highlighted match areas, side-by-side comparison, and exportable PDF reports.
5

章节 05

Algorithm Principles

Algorithm Workflow

Preprocessing

Text extraction → standardization (encoding, format removal, punctuation normalization) → tokenization/stemming → stopword filtering.

Similarity Calculation

Layers: fast fingerprint matching (SimHash/MinHash), N-gram overlap, TF-IDF cosine similarity, semantic embedding (BERT-like models).

Result Aggregation

Weighted fusion of multi-layer results to generate final score and locate matching paragraphs.

6

章节 06

Application Scenarios

Use Cases

  • Education: Student homework查重, exam answer comparison, thesis review.
  • Publishing & Media: Manuscript originality check, internal content deduplication, translation quality inspection.
  • Enterprise: Contract document comparison, code plagiarism detection, knowledge base deduplication.
7

章节 07

Limitations & Improvement Directions

Current Limitations & Future Plans

Limitations

  • Offline dependency (needs full app download, no pure browser run).
  • Limited Chinese support (optimized for English).
  • Unclear integration with public internet databases.
  • Possible ML model false positives.

Improvements

  • Develop pure web version.
  • Add multi-language NLP support.
  • Integrate public document databases.
  • Introduce manual review mechanism.
8

章节 08

Conclusion & Evaluation

Summary

Plagiarism Detection System is a well-designed practical tool. Its hybrid .NET-Python architecture balances development efficiency and detection accuracy, setting a reference paradigm for similar projects. It suits small/medium institutions (quick deployment) and large organizations (customization via secondary development).