FREE ELECTRONIC LIBRARY - Books, dissertations, abstract

Pages:   || 2 | 3 | 4 | 5 |   ...   | 7 |

«DATA, TEXT, AND WEB MINING FOR BUSINESS INTELLIGENCE: A SURVEY Abdul-Aziz Rashid Al-Azmi Department of Computer Engineering, Kuwait University, ...»

-- [ Page 1 ] --

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.2, March 2013



Abdul-Aziz Rashid Al-Azmi

Department of Computer Engineering, Kuwait University, Kuwait



The Information and Communication Technologies revolution brought a digital world with huge amounts of data available. Enterprises use mining technologies to search vast amounts of data for vital insight and knowledge. Mining tools such as data mining, text mining, and web mining are used to find hidden knowledge in large databases or the Internet. Mining tools are automated software tools used to achieve business intelligence by finding hidden relations, and predicting future events from vast amounts of data.

This uncovered knowledge helps in gaining completive advantages, better customers’ relationships, and even fraud detection. In this survey, we’ll describe how these techniques work, how they are implemented.

Furthermore, we shall discuss how business intelligence is achieved using these mining tools. Then look into some case studies of success stories using mining tools. Finally, we shall demonstrate some of the main challenges to the mining technologies that limit their potential.


business intelligence, competitive advantage, data mining, information systems, knowledge discovery

1. INTRODUCTION We live in a data driven world, the direct result of advents in information and communication technologies. Millions of resources for knowledge are made possible thanks to the Internet and Web 2.0 collaboration technologies. No longer do we live in isolation from vast amounts of data.

The Information and Communication Technologies revolution provided us with convenience and ease of access to information, mobile communications and even possible contribution to this amount of information. Moreover, the need of information from these vast amounts of data is even more pressing for enterprises. Mining information from raw data is an extremely vital and tedious process in today’s information driven world. Enterprises today rely on a set of automated tools for knowledge discovery to gain business insight and intelligence. Many branches of knowledge discovery tools were developed to help today’s competitive business markets thrive in the age of information. World’s electronic economy has also increased the pressure on enterprises to adapt to such new business environment. Main tools for getting information from these vast amounts are automated mining tools, specifically speaking data mining, text mining, and web mining.

Data Mining (DM) is defined as the process of analysing large databases, usually data warehouses or internet, to discover new information, hidden patterns and behaviours. It’s a

–  –  –

databases, in multiple dimensions and angles, producing a summary of the general trends found in the dataset, relationships and models that fits the dataset. DM is a relatively new interdisciplinary field involving computer science, statistical modelling, artificial intelligence, information science, and machine learning [1]. One of the main uses of DM is business intelligence and risk management [2]. Enterprises must make business critical decisions based on large datasets stored in their databases, DM directly affect decision-making. DM is relied on in retail, telecommunication, investment, insurance, education, and healthcare industries they are data-driven. Other uses of DM includes biological research such as DNA and the human genome project, geospatial and weather research for analysing raw data used to analyse geological phenomenon.

A related field is Text Mining (TM), which deals with textual data rather than records. TM is defined as automatic discovery of hidden patterns, traits, or unknown information from textual data [7]. Textual data makes up huge amounts of data found on World Wide Web WWW, aside from multimedia. TM is related field to DM, but differs in its techniques and methodologies used.

TM is also an interdisciplinary field encompassing computational linguistics, statistics, and machine learning. TM uses complex Natural Language Processing (NLP) techniques. It involves a training period for the TM tool to comprehend patterns and hidden relations. The process of mining text documents involve linguistically and semantically analysis of the plain text, thus structuring the text. Finally relates and induces some hidden traits found in the text, like frequency of use for some words, entity extractions, and documents summarizations. TM is used, aside from business applications, for scientific research, specifically medical and biological [22].

TM is very useful in finding and matching proteins’ names and acronyms, and finding hidden relations between millions of documents.

The other mining technique is Web Mining (WM). WM is defined as automatic crawling and extraction of relevant information from the artefacts, activities, and hidden patterns found in WWW. WM is used for tracking customers’ online behaviour, most importantly cookies tracking and hyperlinks correlations. Unlike search engines, which send agents to crawl the web searching for keywords, WM agents are far more intelligent. WM work by sending intelligent agents to certain targets, like competitors sites’ [8]. These agents collect information from the host web server and collect as much information from analysing the web page itself. Mainly they look for the hyperlinks, cookies, and the traffic patterns. Using this collected knowledge enterprises can establish better customer relationships, offers and target potential buyers with exclusive deals.

The WWW is very dynamic, and web crawling is repetitive process where contentious iteration will achieve effective results. WM is used for business, stochastic, and for criminal and juridical purposes mainly in network forensics.

In this survey paper, we shall look at the main mining technologies used through information systems for business applications to gain new levels of business intelligence. Furthermore, we shall look at how these techniques can help in achieving both business leadership and risk management by illustrating real enterprises’ own experience using mining techniques. In addition, we shall look at the main challenges facing data, web, and text mining today.


Many developments were made leading to mining technologies we have today. These developments date back to early days of mathematical models and statically analysis using regression and Bayesian methods in mid-1700s. With the advent of commercial electronic International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.2, March 2013 computers after World War II, large data sets were stored into magnetic tapes to automate the work. In the 1960s were data stored in computers helped analysers to answer simple predictive questions. With the development of programming languages, specifically COmmon Business Oriented Language or COBOL, and Rational Database Management Systems RDBMS, querying databases were possible. Meaning more complex information and knowledge can be extracted.

Development of advanced object oriented languages such as C++, Java, multi-dimensional databases, data warehousing, and Online Analytical Processing OLAP made way for an automated algorithmic way of extracting patterns, knowledge from such large data sets. DM tools today are more advanced and provide more than reporting capabilities, they can discover hidden patterns and knowledge. These DM tools were developed in the 1990s.

After the Internet and the WWW revolution in the early 1990s, many research and developments were made to automate the search and exploration of the net, especially text, found in the URLs.

Developments in NLP, neural networks and text processing led ultimately to search engines development. The need for better search algorithms led to textual exploration of web pages.

These developments greatly enhanced the search engines and opened the door for text mining to be applied in several other applications. Search engines’ technologies were centred on agents that could map the vast WWW and correlate keywords and similar other possible keywords. These developments will lead to the more intelligent agents that search the WWW for not only keywords but also site visitors’ patterns. Ultimately, the developments in both DM and TM lead to the notion of WM, were the WWW is used as a source for looking for new knowledge, hidden away somewhere. WM agents are small standalone software, that crawl the WWW, acquiring logging data, cookies, and site visits behaviour found on the servers and other machines attached to the WWW.

The tremendous advancements made in the mining technologies have shifted thought from data collection to knowledge discovery and collection [9]. With today’s powerful and relatively inexpensive hardware and network infrastructure, matched with advanced software for mining, enterprises are adapting mining technologies as essential business processes. In addition, the Internet has an integral role as network and communications are ubiquitous today, mining is carried over the world through the network of databases. The vast amount of knowledge is not only consumed at the top senior management level but at all the other levels of an enterprise as well.

Today mining software utilizes complex algorithms for searching, pattern recognition, and forecasting complex stock market changes. IBM and Microsoft are on an epic race to produce best DM software to date; this is also influenced by security and intelligence agencies such as FBI and CIA. Multi-linguistic and semantic TM is a hot new research topic. As modern as it is today, WM has become an increasingly adopted business process as well. WM is suited more for ecommerce than DM and TM. The nature of e-commerce suggests the direct exploitation of customers’ online behaviours. Many surveyors, such as Gartner Group, predict that over 5 billion dollars of business will be net worth of e-commerce in the coming years [10]. WM is heavily used for e-education and e-business, as the WWW is again their main platform. As developments were huge in the 1990’s in terms of hardware support for mining techniques and the further leaps achieved by modern software, mining techniques are more of a must than a commonplace for modern business today. Relatively new and emerging mining techniques are what are known collectively as Reality Mining [65]. Reality mining is the collection of transactions made daily by individuals to realize how they live and react. Reality mining is aimed at developing our understanding of our modern societies, economies and politics. This is technology is made International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.2, March 2013 possible by the ICT world we live in today. Reality mining which is very controversial as it infiltrate individuals privacy, is catching the intention of governments and corporate, as it can be used for potential business benefits. Reality mining really mines what is known as reality traces, these include all patterns of human life in digital form. Traces include banking transactions, travel tickets, mobile telecommunications calls, blogs, and every possible digital transaction. The aim of such emerging technology is to better understand societies as well as individual and to further develop solutions aimed at them. The main problem facing such new mining technology is privacy concerns from individual, and governments, as data spread on the Internet is not really owned by any legislative body.

3. RELATED WORK Much work was done in surveying business applications of the aforementioned mining techniques. However, most work considers each mining technique separate from one another. In [4] the authors have provided an overview of Knowledge Discovery in Databases (KDD) approaches. They also classified the approaches depending on software characteristics. In [5] the authors demonstrated how modern technologies shifted the process of decision-making, from manual data analysis using modelling and stochastic to an automated computer driven process.

Pages:   || 2 | 3 | 4 | 5 |   ...   | 7 |

Similar works:

«YUJI ITO DAS JAPANISCHE GESELLSCHAFTSRECHT ENTWICKLUNGEN UND EIGENTÜMLICHKEITEN WORKING PAPER NO. 125 SERIES Yuji Ito Das japanische Gesellschaftsrecht Entwicklungen und Eigentümlichkeiten Institute for Law and Finance WORKING PAPER SERIES NO. 125 5/ 2011 Das japanische Gesellschaftsrecht Entwicklungen und Eigentümlichkeiten Yuji Ito I. Einleitung II. Terminologie III. Geschichte des japanischen Gesellschaftsrechts 1. 1868 – 1945: Deutsches Gesellschaftsrecht als Vorbild 2. Der US-Einfluss...»

«Embarking On Enterprise Architecture A Study into Enterprise Architecture Start-up Factors in Dutch Government Organisations Remco Kamphuis (850147093) 28 October 2014, 2014 R. Kamphuis Embarking On Enterprise Architecture, 0.99.B 2/86 R. Kamphuis Embarking On Enterprise Architecture, 0.99.B 3/86 Embarking On Enterprise Architecture A Study into Enterprise Architecture Start-up Factors in Dutch Government Organisations Open Universiteit, faculteit Management, science en technologie...»

«Frankfurt am Main Frankfurt School of Finance & Management Lehren aus der Finanzkrise Die ökonomische Lehre versagt Verantwortungsbewusstsein muss uns vor Krisen bewahren Betreuender Hochschullehrer: Prof. Dr. Carsten Herrmann-Pillath Studentische Teammitglieder: Katharina Kauselmann Michael Kirmes Isabel Mancuso Corinna Schauerte Beitrag zum Postbank Finance Award 2009 Teilnahmebeitrag Postbank Finance Award 2009 Die ökonomische Lehre versagt Verantwortungsbewusstsein muss uns vor Krisen...»

«Goals for Enrollment and Tuition Revenue Elude Many Colleges Finance The Chronic. Page 1 of 9 Finance October 13, 2014 Goals for Enrollment and Tuition Revenue Elude Many Colleges By Scott Carlson As far back as January, Stevenson University’s enrollment models and consultants were sending warning signals about the fall—that retention would go down, that students’ family incomes were probably hurting, that the applications streaming in might not produce students in the flesh. The signs...»

«Univ. Prof. Dr. Christian W Haerpfer Publikationen: 2017 & Ronald Inglehart & Chris Welzl & Patrick Bernhagen (Eds.), Democratization. 2.Auflage (Oxford University Press: Oxford & New York). [110] 2016a & Claire Wallace & Martin McKee (Eds.), Living Conditions, Lifestyles and Health in Russia and the CIS (Routledge Publishers: London & New York) In press [109] 2016b & Kseniya Kizilova. Social Capital as a Factor of Social, Politic and Economic Development of the Countries of Post-Soviet...»

«ISSN 1471-0498 DEPARTMENT OF ECONOMICS DISCUSSION PAPER SERIES A NOVEL COMPUTERIZED REAL EFFORT TASK BASED ON SLIDERS David Gill and Victoria Prowse Number 435 May 2009 Manor Road Building, Oxford OX1 3UQ A Novel Computerized Real Effort Task Based on Sliders∗ David Gill†, Victoria Prowse‡ 1 May 2009 Abstract In this note, we present a novel computerized real effort task based on moving sliders across a screen which overcomes many of the drawbacks of existing real effort tasks. The...»

«Diskussionsbeiträge des Fachbereichs Wirtschaftswissenschaft der Freien Universität Berlin Nr. 2004/21 VOLKSWIRTSCHAFTLICHE REIHE Estimating Medieval Market Integration: Evidence from Exchange Rates Oliver Volckart and Nikolaus Wolf ISBN 3-935058-91-8 Estimating Medieval Market Integration: Evidence from Exchange Rates * Contents 1. Introduction 2. Medieval Exchange Rates and Coinage: Data from Flanders, Lübeck and Prussia 2.1. Exchange rates 2.2. Coinage 3. Measuring Monetary Integration...»

«Die Rechtsstellung Der Frau ALS Gattin Und Mutter 1903 Epub variety, fireplace houses, reaching plans, credit resources, time phone, undesirable funds download will no contribute booted and been not for a quantum identified in the statistics. They may encourage fluency at a plan that you do Die Rechtsstellung Der Frau ALS Gattin Und Mutter (1903) initial from longer someone of this property. Than each, you can constantly be the effort if you were win the expected Die Rechtsstellung Der Frau ALS...»

«How Can Research Organizations More Effectively Transfer Research Knowledge to Decision Makers?J O H N N. L AV I S, D AV E R O B E RT S O N, J E N N I F E R M. WOODSIDE, CHRISTOPHER B. McLEOD, J U L I A A B E L S O N, a n d t h e K n o w l e d g e Tr a n s f e r Study Group McMaster University; Institute for Work & Health; Canadian Institute for Advanced Research; Queen’s University A pplied research organizations invest a great deal of time, and research funders invest a great deal of...»

«Discussion Paper No. 02-34 Intergenerational Poverty Dynamics in Poland: Family Background and Children’s Educational Attainment During Transition Miriam Beblo and Charlotte Lauer ZEW Zentrum für Europäische Wirtschaftsforschung GmbH Centre for European Economic Research Discussion Paper No. 02-34 Intergenerational Poverty Dynamics in Poland: Family Background and Children’s Educational Attainment During Transition Miriam Beblo and Charlotte Lauer Download this ZEW Discussion Paper from...»

<<  HOME   |    CONTACTS
2016 www.book.dislib.info - Free e-library - Books, dissertations, abstract

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.