Application of Data Mining Methodology in e-Learning

Application of Data Mining Methodology in e-Learning by Abdel-Badeeh M. Salem
Computer Science Department - Faculty of Computer & Information Sciences
Ain Shams University, Abbassia, Cairo, Egypt
E-mail: abmsalem@yahoo.com badeehcs@gmail.com

Abstract

Data mining methodology aims to extract useful information and discover some hidden patterns from huge amount of databases, which statistical approaches cannot discover. It is a multidisciplinary field of research includes: machine learning, databases, statistics, expert systems, visualization, high performance computing, rough sets, neural networks, and knowledge representation, etc. Recently, researchers have begun to investigate various data mining methods to help instructors and administrators to improve e-learning systems . These methods discover new, interesting and useful knowledge based on students’ usage data. Some of the mains e-learning problems or subjects to which data mining techniques have been applied are dealing with the assessment of student’s learning performance, provide course adaptation and learning recommendations based on the students’ learning behavior, dealing with the evaluation of learning material and educational web-based courses , provide feedback to both teachers and students of e-learning courses, and detection of atypical student’s learning behavior. In this paper discusses the application of the association rule-based techniques and grouping-based approaches (clustering and classification) in e-Learning domain .

1. Data Mining Methodology and Tasks.
1.1 Knowledge Discovery in Databases(KDD) Process.
KDD process involves the following processes; (a) using the database along with any required selection, Preprocessing, subsampling , and transformations of it, (b) applying data mining methods (algorithms) to enumerate patterns from it. and (c) evaluating the products of data mining to identify the subset of the enumerated patterns deemed knowledge. The data mining components of the KDD process is concerned with the algorithmic means by which patterns are extracted and enumerated from data. The overall KDD process includes the evaluation and possible interpretation of the mined patterns to determine which patterns can be considered new Knowledge. KDD process is interactive and iterative, involving numerous steps with many decisions made by the user. In what follows a brief description of the main phases of the KDD process [1,2].

(a) The Preprocessing Phase: This phase consists of the following four activities;
1. Define the Goal of KDD Process: that is, developing an understanding of the application domain and the relevant prior knowledge and identifying the goal of the KDD Process from the customer's viewpoint.
2. Selection of Target Dataset: that is, creating a target data set, OR selecting a data set, OR focusing on a subset of variables or data samples, on which discovery is to be performed.
3. Data cleaning and preprocessing : in which the following tasks are performed (a) removing noise if appropriate, (b) collecting the necessary information to model or account for noise, (c) deciding on strategies for handling missing data fields, (d) accounting for time-sequence information and known changes.
4. Data reduction and transformation: Finding useful features to represent the data depending on the goal of the task. With dimensionality reduction or transformation methods, the effective number of variables under consideration can be reduced, or invariant representations for the data can be found.

(b) The Data Mining Phase
1. Matching the goals of the KDD process to a particular data-mining method, e.g. Summarization, Regression, Classification and Clustering.
2. Exploratory analysis and model & hypothesis selection: Choosing the mining algorithm(s) and selecting method(s) to be used for searching for data patterns. This process includes ,(a) Deciding which models and parameters might be appropriate (models for categorical data are different than models of vectors) ,and (b) Matching a particular data-mining method with the overall criteria of the KDD process (the end user might be more interested in understanding the model than its predictive capabilities).
3. Data mining: Searching for patterns of interest in a particular representational form or a set of such representations, including classification rules or trees, regression, and clustering.

(c) The Knowledge Discovery Phase.
1. Interpreting mined patterns: This process involves visualization of the extracted patterns and models or visualization of the data given the extracted models possibly returning to any of the previous process for further iteration.
2. Acting on the discovered knowledge: Using the knowledge directly, incorporating the knowledge into another system for further actions, or simply documenting it and reporting it to interested parties. This process includes checking for and resolving potential conflicts with previously believed (or extracted) knowledge.

1.2 Data Mining Tasks
Data mining is supported by a host that captures the character of data in several different ways.
1. Clustering: The key objective is to find natural groupings (clusters) in highly dimensional data. Clustering is an example of unsupervised learning, and it is a part of pattern recognition.
2. Regression Models: These originate from standard regression analysis and its applied part known as system identification. The underlying idea is to construct a linear or nonlinear function
3. Classification: This concerns learning that classifies data into the predetermined categories. The term originates from pattern recognition, in which a vast number of classifiers have been developed.
4. Summarization: This is an approach towards characterizing data via small number of features/attributes. In the simplest scenario one can think of a mean and standard deviations as two extremely compact descriptors of the data. This technique is often applied in an interactive exploratory data analysis and automated report generation.
5. Link analysis: It is concerned with determination of relationships (dependencies) between fields in a database. In a particular case we may be interested in the determination of the correlation between the variables.
6. Sequence Analysis: This type of analysis is geared toward problems of modeling sequential data. Pertinent models embrace time series analysis, time series models, and temporal neural networks.

2. Application of Grouping-Based Techniques in E-Learning

2.1 Clustering Technique
Clustering techniques apply when the instances of data are to be divided into natural groups. The classical clustering technique is k-means where clusters are specified in advance prior to application of the algorithm. This corresponds to parameter k. Then k points are chosen at random as clusters centers. All instances are assigned to their closest cluster center according to the Euclidian distance metric. Next the centroid, or mean, of each cluster center is calculated. These centroids are taken to be the new cluster centers for their respective clusters. The whole process is repeated with the new cluster centers. Iteration continues until the same points are assigned to each cluster in consecutive runs. At this point the cluster centers have stabilized and will remain the same [3]. There are many variants of clustering even for the kmeans algorithm depending upon the method of choosing the initial centers.

2.2 Usage of Clustering Technique in E-Learning
Clustering is a process of grouping objects into classes of similar objects [11]. It is an unsupervised classification or partitioning of patterns (observations, data items, or feature vectors) into groups or subsets (clusters) based on their locality and connectivity within an n-dimensional space. In e-learning, clustering has been used for:
a) Finding clusters of students with similar learning characteristics and to promote group-based collaborative learning as well as to provide incremental learner diagnosis.
b) Discovering patterns reflecting user behaviors and for collaboration management to characterize similar behavior groups in unstructured collaboration spaces.
c) Grouping students and personalized itineraries for courses based on learning objects.
d) Grouping students in order to give them differentiated guiding according to their skills and other characteristics.
e) Grouping tests and questions into related groups based on the data in the score matrix.
f) Grouping users based on the time-framed navigation sessions.

2.3 Usage of Classification Technique in E-Learning
A classifier is a mapping from a (discrete or continuous) feature space X to a discrete set of labels Y [12]. Classification or discriminant analysis predicts class labels. This is supervised classification which provides a collection of labeled (preclassified) patterns, the problem being to label a newly encountered, still unlabeled, pattern. In e-learning, classification has been used for:
• Discovering potential student groups with similar characteristics and reactions to a specific pedagogical strategy.
• Predicting students’ performance and their final grade.
• Detecting students’ misuse or students playing around.
• Predicting the students’ performance as well as to assess the relevance of the attributes involved.
• Grouping students as hint-driven or failure-driven and finding students’ common misconceptions.
• Identifying learners with little motivation and finding remedial actions in order to lower drop-out rates.
• Predicting course success

3. Application of Association Rule-Based Techniques in E-Learning.
3.1 Association Rule Mining(ARM) Approach
ARM is one of the most well studied data mining tasks. It discovers relationships among attributes in databases, producing if-then statements concerning attribute-values [5]. An association rule X ⇒ Y expresses that in those transactions in the database where X occurs; there is a high probability of having Y as well. X and Y are called respectively the antecedent and consequent of the rule. The strength of such a rule is measured by its support and confidence. The confidence of the rule is the percentage of transactions with X in the database that contain the consequent Y also. The support of the rule is the percentage of transactions in the database that contain both the antecedent and the consequent.
ARM has been applied to e-learning systems for traditionally association analysis (finding correlations between items in a dataset).An efficient algorithm to discover these association rules was first introduced in [5]. The algorithm constructs a candidate set of frequent item sets of length k, counts the number of occurrences, keeps only the frequent ones, then constructs a candidate set of item sets of length k+1 from the frequent item sets of smaller length. It continues iteratively until no candidate item set can be constructed. In other words, every subset of a frequent item set must also be frequent. The rules are then generated from the frequent item sets with probabilities attached to them indicating the likelihood (called support) that the association occurs. We use this idea of association rules to train our recommender agent to build a model representing the web page access behavior or associations between on-line learning activities.

3.2 Usage of ARM in Web-Based Education Systems

ARM has been applied to web-based education systems for the following tasks:
a) Building recommender agents that could recommend on-line learning activities or shortcuts.
b) Diagnosing student learning problems and offer students advice.
c) Guiding the learner’s activities automatically and recommending learning materials.
d) Determining which learning materials are the most suitable to be recommended to the user.
e) Identifying attributes characterizing patterns of performance disparity between various groups of students.
f) Discovering interesting relationships from student’s usage information in order to provide feedback to course author.
g) Finding out relationships in learners’ behaviour patterns.
h) Finding students’ mistakes that often accompany each other.
i) Guiding the search for best fitting transfer models of student learning.
j) Optimizing the content of the e-learning portal by determining what most interests the user.

3.3 Usage of Sequential Pattern Mining (SPM)

SPM is a more restrictive form of association rule mining in which the accessed items’ order is taken into account. It tries to discover if the presence of a set of items is followed by another item in a time-ordered set of sessions or episodes [13]. The applications of sequential patterns in e-learning can be summarized in the following:
a) Evaluating learners’ activities and can be used in adapting and customizing resource delivery.
b) Discovering and comparison with expected behavioral patterns specified by the instructor that describe an ideal learning path.
c) Giving an indication of how to best organize the educational web space and be able to make suggestions to learners who share similar characteristics.
d) Generating personalized activities to different groups of learners.
e) Supporting the evaluation and validation of learning site designs.
f) Identifying interaction sequences indicative of problems and patterns that are markers of success.

4. Usage of Text Mining (TM)
TM can be viewed as an extension of data mining to text data and it is closely related to web content mining. Its methods include text mining that can work with unstructured or semi-structured data sets such as full-text documents, HTML files and emails [14]. The specific application of text mining techniques in e-learning can be used for the following:
• Grouping documents according to their topics and similarities and providing summaries.
• Finding and organizing material using semantic information.
• Supporting editors when gathering and preparing the materials.
• Evaluating the progress of the thread discussion to see what the contribution to the topic is.
• Collaborative learning and a discussion board with evaluation between peers.
• Identifying the main blocks of multimedia presentations.
• Selecting articles and automatically constructing e-textbooks and personalized courseware.
• Detecting the conversation focus of threaded discussions, classifying topics and estimating the technical depth of contribution.

5. Conclusions

The paper discusses the application of data mining techniques in e-learning tasks and domains. The following techniques; visualization, clustering, classification, sequential pattern mining, and text mining are discussed from e-learning prospective. Data mining techniques can enhance on-line education for the educators as well as the learners. While some tools using data mining techniques to help educators and learners are being developed, the research is still in its infancy. Data mining techniques are very promising approach towards the analysis of the data of student activities and behavior which accumulated by learning management systems. Most of the current data mining tools are too complex for educators to use their features go well beyond the scope of what an educator might require.

6. References
[1] Cios K. J., Pedrycz, W. and Swiniarski, R. W. Data Mining Methods for Knowledge Discovery. Kluwer 1998.
[2] Romero, C., & Ventura, S. Data mining in e-learning. Southampton, UK: Wit Press 2006.
[3] I. H. Witten and E. Frank, Data Mining – Practical Machine Learning Tools and Techniques. 2nd ed Elsevier, 2005.
[4] C. Cortes and V. Vapnik, “Support vector networks”, Machine Learning, vol. 20, pp. 273-297, 1995.
[5] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data, pages 207–216, Washington, D.C., May 1993.
[6] A. M. salem, safia A. Mahmoud., “Mining patient Data Based on Rough Set Theory to Determine Thrombosis Disease”, Proceedings of First Intelligence conference on Intelligent Computing and Information Systems, pp 291-296. ICICIS 2002, Cairo, Egypt, June 24-26,2002.
[7] Abdel-Badeeh M.Salem and Abeer M.Mahmoud, “A Hybrid Genetic Algorithm-Decision Tree Classifier”, Proceedings of the 3rd International Conference on New Trends in Intelligent Information Processing and Web Mining, Zakopane, Poland, pp. 221-232, June 2-5, 2003.
[8] C. Romero, S. Ventura, E. Garcıa. Data mining in course management systems: Moodle case study and tutorial. Computers & Education 2007.
[9] Spence, R. Information visualization. Addison-Wesley 2001.
[10] I. Cadez, D. Heckerman, and C. Meek. Visualization of navigation patterns on web site using model based clustering. In ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD’00), PP 280–284, Boston, USA, August 2000.
[11] Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323.
[12] Duda, R. O., Hart, P. E., & Stork, D. G. Pattern classification. Wiley Interscience 2000.
[13] Agarwal, R., & Srikant, R. Mining sequential patterns. In Proceedings of the eleventh international conference on data engineering, Taipei, Taiwan (pp. 3–14), 2005.
[14] Feldman, R., & Sanger, J. The text mining handbook. Cambridge University Press 2006.
[15] Zaı¨ane, O., & Luo, J. Web usage mining for a better web-based learning environment. In Proceedings of conference on advanced technology for education, Banff, Alberta, PP. 60–64 2001.
[16] Mazza, R., & Milani, C. Exploring usage analysis in learning systems: Gaining insights from visualisations. In Workshop on usage analysis in learning systems at 12th international conference on artificial intelligence in education, New York, USA PP. 1–6, 2005.
[17] Silva, D., & Vieira, M. Using data warehouse and data mining resources for ongoing assessment in distance learning. In IEEE international conference on advanced learning technologies, Kazan, Russia PP. 40–45, 2002.
[18] Bellaachia, A., Vommina, E., & Berrada, B. (2006). Minel: A framework for mining e-learning logs. In Proceedings of the fifth IASTED international conference on Web-based education, Mexico PP. 259-263, 2006.

Gruppo web semantico

Cerca nel blog

Application of Data Mining Methodology in e-Learning

Etichette

Commenti

Posta un commento

Post popolari in questo blog

Glossario bilingue sulla formazione

ITTIG-CNR, GLOBO e Tecnodiritto si incontrano per parlare di digitalizzazione della PA