รายละเอียดโครงงาน

หลักสูตร/ปี พ.ศ.
วิศวกรรมศาสตรบัณฑิต สาขาวิชาวิศวกรรมคอมพิวเตอร์ ปี พ.ศ. 2558

ภาคและปีการศึกษาที่สำเร็จการศึกษา
ภาคปลาย ปีการศึกษา 2557

ประเภทโครงงาน
โครงงานวิศวกรรม

ชื่อโครงงานภาษาไทย
การพัฒนาอัลกอริทึมการจำแนกประเภทข้อมูลโดยใช้กฏความสัมพันธ์ บนแพลตฟอร์มฮาดูป

ชื่อโครงงานภาษาอังกฤษ
Implementation of Class Association Rule Mining on Hadoop platform

ผู้พัฒนา
5410500229 ภควัต กิจสุวรรณไพศาล

อาจารย์ที่ปรึกษาหลัก
กฤษณะ ไวยมัย

อาจารย์ที่ปรึกษาร่วม
ธนาวินท์ รักธรรมานนท์

บทคัดย่อ

การทำเหมืองข้อมูล (Data Mining) เข้ามามีบทบาทสำคัญในปัจจุบัน ไม่ว่าจะในองค์กร หรือในการศึกษาศาสตร์แขนงอื่นๆ ที่จำเป็นต้องทำงานร่วมกับข้อมูลจำนวนมาก ไม่ว่าจะเป็นการ แปลความ การจำแนกประเภท หรือการทำนาย การจำแนกประเภทข้อมูลโดยใช้กฎความสัมพันธ์ เป็นหนึ่งในเทคนิคหลักที่นิยมใช้ในการทำเหมืองข้อมูล อย่างไรก็ตามในปัจจุบันข้อมูลมีปริมาณเพิ่มขึ้นทุกวัน รวมถึงมีความซับซ้อนมากยิ่งขึ้น เช่นข้อมูลของผู้ใช้งานเครือข่ายสังคม (Social Networks) ทำให้จะหาความสัมพันธ์ของข้อมูลแต่ละครั้งนั้นใช้เวลานานมาก ผู้จัดทำจึงเกิดความคิดที่จะทำการพัฒนาอัลกอริทึมการจำแนกประเภทข้อมูล ให้สามารถรองรับการทำงานแบบขนานได้ และนำไปทำงานบนแพลตฟอร์มที่รองรับการทำงานแบบหลายโหนด ในที่นี้ทางผู้จัดทำเลือกใช้ Hadoop

ในการศึกษา และพัฒนาอัลกอริทึมการจำแนกประเภทด้วยกฎความสัมพันธ์ ผู้จัดทำได้ทำการศึกษาหลักการทำงานของ MapReduce model รวมไปถึงการทำงานของ Hadoop แบบโหนดเดียว และแบบหลายโหนด จากนั้นจึงทำการพัฒนาอัลกอริทึมที่สอดคล้องกับการทำงานแบบขนาน ทำการติดตั้ง Hadoop ลงบนเครื่องจำลองเสมือนตามจำนวนที่ได้กำหนดไว้ แล้วทดสอบประสิทธิภาพการทำงาน นำมาสรุปผลและเปรียบเทียบกับการทำงานแบบต่างๆ

การพัฒนาอัลกอริทึมการจำแนกประเภทข้อมูลโดยใช้กฎความสัมพันธ์ สามารถช่วยแก้ปัญหาเรื่องระยะเวลาการทำงานบนข้อมูลจำนวนมหาศาลได้ นอกจากนี้ยังสามารถนำไปพัฒนาต่อยอดได้อีกหลากหลายรูปแบบ เช่นสามารถเปลี่ยนรูปแบบของ Mapper และ Reducer class ของ MapRuduce ให้เข้ากับการทำงานได้, หรือนำไปพัฒนาบน Apache Spark ได้อีกด้วย

Abstract

Data Mining takes an important role in many fields nowadays. While Association Rules Mining is one of the main techniques which is used to find the relationships between itemsets in transactions, the size of datasets in many fields is growing up drastically, for example, data in Social Networks. Running the algorithm on a single-node computational unit may take a very long time to finish the job. One of the effective ways to reduce the computation time especially when running Association Rules algorithm on Big Data is to implement a parallel version of the algorithm on distributed platforms. We choose Hadoop as a platform to implement Class Association Rules algorithm

In order to optimize Class Association Rules algorithm to the parallel version, we first study MapReduce model including the idea of how Hadoop works -- both single-node and multi-node platform. Then we think about how to modify the original algorithm to the parallel one which can be implemented on Hadoop MapReduce. Set up Hadoop on multi-node clusters and implement the algorithm. Testing and evaluating.

By eliminating limitations in single-node computation, the extension of associative classification over Big Data can improve the computing performance of classification algorithm especially when we have to deal with large-scale datasets. Moreover, we can further the implementation to improve the performance in many ways -- for example, by changing the implementation on Mapper and Reducer class, or even by extending the implementation to Apache Spark.

คำสำคัญ (Keywords)

การจำแนกประเภทข้อมูลโดยใช้กฎความสัมพันธ์
ฮาดูป
ข้อมูลขนาดใหญ่
การทำเหมืองข้อมูล

เว็บไซต์โครงงาน
-

วีดีโอคลิปของโครงงาน

ที่เก็บเวอร์ชันซอร์สโค้ด

https://github.com/Spacez/CARs-on-Hadoop (private)

สถานะการนำเข้าข้อมูล

ผู้นำเข้าข้อมูลครั้งแรก
ภควัต กิจสุวรรณไพศาล (b5410500229)

แก้ไขครั้งสุดท้าย
เมื่อ May 28, 2015, 2:26 p.m. โดย ภควัต กิจสุวรรณไพศาล (b5410500229)

สถานะการอนุมัติ
อนุมัติแล้ว โดย กฤษณะ ไวยมัย (fengknw) เมื่อ May 29, 2015, 9:36 p.m.