รายละเอียดโครงงาน

หลักสูตร/ปี พ.ศ.
วิศวกรรมศาสตรบัณฑิต สาขาวิชาวิศวกรรมคอมพิวเตอร์ ปี พ.ศ. 2569

ภาคและปีการศึกษาที่สำเร็จการศึกษา
ภาคปลาย ปีการศึกษา 2568

ประเภทโครงงาน
โครงงานวิศวกรรม

ชื่อโครงงานภาษาไทย
ระบบลบข้อมูลระบุตัวบุคคลในเอกสารภาษาไทยเพื่อการปฏิบัติตามพระราชบัญญัติคุ้มครองข้อมูลส่วนบุคคล

ชื่อโครงงานภาษาอังกฤษ
Thai Document PII Redaction System for PDPA Compliance

ผู้พัฒนา
6510503310 ชวัลวิทย์ เกียรติณัฐกร

อาจารย์ที่ปรึกษาหลัก
ธนาวินท์ รักธรรมานนท์

อาจารย์ที่ปรึกษาร่วม
-

บทคัดย่อ

การปกป้องข้อมูลส่วนบุคคลในเอกสารดิจิทัลเป็นข้อกำหนดสำคัญตามพระราชบัญญัติคุ้มครองข้อมูลส่วนบุคคล (PDPA) ซึ่งกำหนดให้องค์กรต้องปกปิดข้อมูลที่สามารถระบุตัวบุคคลได้ก่อนการเผยแพร่หรือส่งต่อเอกสาร อย่างไรก็ตาม เอกสารภาษาไทยในรูปแบบ PDF และรูปภาพ เช่น สัญญา ใบเสร็จ และแบบฟอร์ม มักมีข้อมูลสำคัญหลายประเภท ทำให้การปกปิดข้อมูลแบบแมนนวลใช้เวลานานและมีโอกาสเกิดความผิดพลาด งานวิจัยนี้นำเสนอการออกแบบและพัฒนาระบบต้นแบบสำหรับตรวจจับและปกปิดข้อมูลส่วนบุคคลในเอกสารภาษาไทยแบบกึ่งอัตโนมัติ โดยใช้ Optical Character Recognition (OCR) เพื่อสกัดข้อความ และใช้ Named Entity Recognition (NER) บนโครงสร้าง WangchanBERTa ร่วมกับกฎเชิงรูปแบบเพื่อระบุข้อมูลส่วนบุคคล เช่น เลขประจำตัวประชาชน ชื่อ–นามสกุล และหมายเลขโทรศัพท์ จากนั้นระบบทำการปกปิดข้อมูลบนเอกสารต้นฉบับและบันทึกการดำเนินการผ่านระบบ Audit log เพื่อรองรับการตรวจสอบย้อนหลัง ผลการพัฒนาพบว่าระบบช่วยลดภาระงานแบบแมนนวล เพิ่มความถูกต้องในการปกปิดข้อมูล และสนับสนุนการปฏิบัติตามข้อกำหนดด้านการคุ้มครองข้อมูลส่วนบุคคลได้อย่างมีประสิทธิภาพ

Abstract

Protection of personally identifiable information in digital documents is a key requirement under the Personal Data Protection Act (PDPA), which requires organizations to conceal sensitive data before documents are shared or published. Thai documents in PDF and image formats, such as contracts, receipts, and official forms, often contain multiple types of personal information, making manual redaction time-consuming and error-prone. This paper presents the design and development of a semi-automatic system for detecting and redacting personally identifiable information in Thai documents. The system applies Optical Character Recognition (OCR) to extract text and uses Named Entity Recognition (NER) based on a WangchanBERTa architecture combined with rule-based pattern matching to identify personal data such as national ID numbers, names, and phone numbers. The detected information is irreversibly redacted from the original document, and all operations are recorded using an audit logging mechanism for traceability. Experimental results show that the proposed system reduces manual effort, improves redaction accuracy, and effectively supports compliance with personal data protection regulations.

คำสำคัญ (Keywords)

Data Redaction, Personally Identifiable Information, PDPA, OCR, Named Entity Recognition

เว็บไซต์โครงงาน
https://pii-redaction-app.vercel.app/

วีดีโอคลิปของโครงงาน
-

ที่เก็บเวอร์ชันซอร์สโค้ด

Backend: https://gitlab.com/chavalvit.keart/pii-redaction-api
Frontend: https://gitlab.com/chavalvit.keart/pii-redaction-app

สถานะการนำเข้าข้อมูล

ผู้นำเข้าข้อมูลครั้งแรก
ชวัลวิทย์ เกียรติณัฐกร (b6510503310)

แก้ไขครั้งสุดท้าย
เมื่อ March 17, 2026, 3:32 p.m. โดย ชวัลวิทย์ เกียรติณัฐกร (b6510503310)

สถานะการอนุมัติ
รออนุมัติ