FREE ELECTRONIC LIBRARY - Books, dissertations, abstract

Pages:   || 2 | 3 | 4 | 5 |   ...   | 7 |

«Algorithms for Data Migration E. Anderson · J. Hall · J. Hartline · M. Hobbes · A. Karlin · J. Saia · R. Swaminathan · J. Wilkes Received: 15 ...»

-- [ Page 1 ] --


DOI 10.1007/s00453-008-9214-y

Algorithms for Data Migration

E. Anderson · J. Hall · J. Hartline · M. Hobbes ·

A. Karlin · J. Saia · R. Swaminathan · J. Wilkes

Received: 15 August 2007 / Accepted: 16 July 2008

© Springer Science+Business Media, LLC 2008


The data migration problem is the problem of computing a plan for moving data objects stored on devices in a network from one configuration to another.

Load balancing or changing usage patterns might necessitate such a rearrangement E. Anderson · R. Swaminathan · J. Wilkes Hewlett–Packard Laboratories, Palo Alto, CA 94304, USA E. Anderson e-mail: eric.anderson4@hp.com R. Swaminathan e-mail: ram.swaminathan@hp.com J. Wilkes e-mail: john.wilkes@hp.com J. Hall · A. Karlin Department of Computer Science, University of Washington, Seattle, WA 98195, USA J. Hall e-mail: jkh@cs.washington.edu A. Karlin e-mail: karlin@cs.washington.edu J. Hartline Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL 60208, USA e-mail: hartline@eecs.northwestern.edu M. Hobbes School of Engineering and Information Technology, Deakin University, Geelong, VIC 3217, Australia e-mail: mick@deakin.edu.au J. Saia ( ) Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, USA e-mail: saia@cs.unm.edu Algorithmica of data. In this paper, we consider the case where the objects are fixed-size and the network is complete. Our results are both theoretical and empirical. Our main theoretical results are (1) a polynomial time algorithm for finding a near-optimal migration plan in the presence of space constraints when a certain number of additional nodes is available as temporary storage, and (2) a 3/2-approximation algorithm for the case where data must be migrated directly to its destination. We also run extensive experiments on several algorithms for various data migration problems and show that empirically, many algorithms perform better in practice than their theoretical bounds suggest. We conclude that many of the algorithms we present are both practical and effective for data migration.

Keywords Data migration · Edge coloring · Load-balancing · Space constraints · Approximation algorithms 1 Introduction The performance of modern day large-scale storage systems (such as disk farms) depends critically on having an assignment of data to storage devices that balances the load across the devices or that optimizes a more complex cost function. Unfortunately, the optimal data layout is likely to change over time because of workload changes, device additions, or device failures. Thus, it is common to periodically compute a new assignment of data to devices [3, 5–7, 22], either at regular intervals or on demand as system changes occur. Once the new assignment is computed, the data must be migrated from the old configuration to the new configuration. This migration must be done as efficiently as possible to minimize the impact of the migration on the system. The large size of the data objects (gigabytes are common) and the large amount of total data (can be petabytes) makes migration a process which can easily take several days.

In this paper, we consider the problem of finding an efficient migration plan. We focus solely on the off-line migration problem, i.e., we ignore the load imposed by user requests for data objects during the migration. Our motivation for studying this problem lies in migrating data for large-scale storage system management tools such as Hippodrome [2]. Hippodrome automatically adapts to changing demands on a storage system without human intervention. It analyzes a running workload of requests to data objects, calculates a new load-balancing configuration of the objects and then migrates the objects. An offline migration can be performed as a background process or at a time when loads from user requests are low (e.g. over the weekend).

The input to our migration problem is an initial and final configuration of data objects on devices, and a description of the storage system such as the storage capacity of each device, and the underlying network connecting the devices. Our goal is to find a migration plan that uses the existing network connections between storage devices to move the data from the initial configuration to the final configuration in the minimum amount of time. For obvious reasons, we require that all intermediate configurations in the plan be valid: they must obey the space constraints of the storage devices as well as usability requirements of the on-line system. (The migration Algorithmica process can be stopped at any time and the on-line system should still be able to run and maintain full functionality.) The time it takes to perform a migration is a function of the sizes of the objects being transferred, the network link speeds and the degree of parallelism in the plan.

A crucial constraint on the legal parallelism in any plan is that each storage device can be involved in the transfer (either sending or receiving, but not both) of only one object at a time.

Most variants one can imagine on this problem are NP-complete. The migration problem for networks of arbitrary topology is NP-complete even if all objects are the same size and each device has only one object that must be moved off of it. The problem is also NP-complete when there are just two storage devices connected by a link, if the objects are of arbitrary sizes.1 We will always make the following assumptions.

1. The data objects are all the same size;

2. There is at least one free space on each storage device in both the initial and final configurations of the data;

3. There is a direct bidirectional link between each pair of devices and each link has the same bandwidth; and

4. Every device has the same read and write speed.

The first assumption is quite reasonable in practice if we allow ourselves to subdivide the existing variable sized objects into unit sized pieces, since the time it takes to send the subdivided object is about the same as the time it takes to send the entire object. The second assumption is also reasonable in practice, since free space somewhere is required in order to move any objects, and having only one free space in the entire network would limit the solution to being sequential.

The third and fourth assumptions are reasonable if we assume that the storage devices are connected in a local area network (SAN), for example in a disk farm. The third assumption is reasonable since the topologies in SANs are generally close to being complete. The fourth assumption is reasonable since an organization building a SAN of storage devices would typically buy devices which have similar performance characteristics.

With these assumptions, we are led to describe the input to our problem as a directed multigraph G = (V, E) without self-loops that we call the demand graph.

Each node in the demand graph corresponds to a storage device, and each directed edge (u, v) ∈ E represents an object that must be moved from storage device u (in the initial configuration) to storage device v (in the final configuration). An example of how a demand graph is defined based on an initial and goal configuration is given in Fig. 1.

Since we are considering fixed-size objects, our migration plan can be divided into stages where each stage consists of a number of compatible sends, i.e., each stage is a matching. Thus, we can observe that the special case of our problem when there are no space constraints on the storage devices, and sends must be direct, is precisely the multigraph edge coloring problem (the directionality of the edges becomes irrelevant

–  –  –

Fig. 1 An example demand graph. v1, v2, v3, v4 are devices in the network and the edges in the first two graphs represent links between devices. a, b, c, d are the data objects which must be moved from the initial configuration to the goal configuration and the colors of the edges represent the time steps when objects are sent). This problem is of course NP-complete, but there are good approximation algorithms for it, as we shall review in Sect. 1.4. The storage migration application introduces two interesting twists on the traditional edge coloring problem: edge coloring with bypass nodes and edge coloring with space constraints. We review these variants briefly in the next section.

1.1 Data Migration Variants 1.1.1 Bypass Nodes In the first variant on edge coloring, we consider the question of whether indirect plans can help us to reduce the time it takes to perform a migration. In an indirect plan, an object might temporarily be sent to a storage device other than its final destination. It is easy to see that this might reduce the number of stages in the migration plan. As a first step towards attacking the problem of constructing near-optimal indirect plans, we introduce the concept of a bypass node. A bypass node is an extra storage device that can be used to store objects temporarily in an indirect plan. (In practice, some of the storage devices in the system may either not be involved or will be only minimally involved in the migration of objects and these devices can be used as bypass nodes.) A natural question to then ask is: what is the trade-off between the number of bypass nodes available and the time it takes to perform the migration? In particular, how many bypass nodes are needed in order to perform the migration in (G) steps, where (G) (or where G is understood) is the maximum total degree of any node in the demand graph G. ( is a trivial lower bound on the number of steps needed, no matter how many bypass nodes are available.) An optimal direct migration takes at least χ parallel steps where χ is the chromatic index of the demand graph (the minimum number of colors in any edge coloring). If our solution is not required to send objects directly from source to destination it is possible that there is a migration plan that takes less than χ stages. In general, our goal will be to use a small number of bypass nodes, extra storage devices in the network available for temporarily storing objects, to perform the migration in essentially stages.

Algorithmica Fig. 2 Example of how to use a bypass node. In the graph on the left, each edge is duplicated k times and clearly χ = 3k. However, using only one bypass node, we can perform the migration in = 2k stages as shown on the right. (The bypass node is shown as “◦”) Definition 1 A directed edge (v, w) in a demand graph is bypassed if it is replaced by two edges, one from v to a bypass node, and one from that bypass node to w.

An important constraint that bypassing an edge imposes is that the object must be sent to the bypass node before it can be sent from the bypass node. In this sense, edges to and from bypass nodes are special. Figure 2 gives an example of how a bypass node can be used.

By replicating the graph on the left side of Fig. 2 n/3 times, we see that there exist graphs which require n/3 bypass nodes in order to complete a migration in steps.

1.1.2 Space Constraints

When space constraints are introduced, we obtain our second, more complex, variant on the edge coloring problem. We consider here the limiting case where there is the minimum possible free space at each node, including bypass nodes, such that there is at least one free space in both the initial and final configurations. We can define this

problem more abstractly as the edge coloring with space constraints problem:

The input to the problem is a directed multigraph G, where there are initially F (v) free spaces on node v. By the free space assumption, F (v) = max(din (v) − dout (v), 0) + 1, where din (v) is the in-degree and dout (v) is the out-degree of node v.

The problem is to assign a positive integer (a color) to each edge so that the maximum integer assigned to any edge is minimized (i.e., the number of colors used is minimized) subject to the constraints that

• no two edges incident on the same node have the same color, and (i) (i) (i) (i)

• for each i and each node v, cin (v) − cout (v) ≤ F (v), where cin (v) (resp. cout (v)) is the number of in-edges (resp. out-edges) incident to v with color at most i.

The second condition, which we refer to as the space constraint condition, captures the requirement that at all times the space consumed by data items moved onto a storage device minus the space consumed by data items moved off of that storage device cannot exceed the initial free space on that device.

Pages:   || 2 | 3 | 4 | 5 |   ...   | 7 |

Similar works:

«ISSN 1175-5326 (print edition) Zootaxa 2874: 65–68 (2011) Correspondence ZOOTAXA www.mapress.com/zootaxa/ Copyright © 2011 · Magnolia Press ISSN 1175-5334 (online edition) Neither molecular nor morphological data have all the answers; with an example from Macrobrachium (Decapoda: Palaemonidae) from Australia TIMOTHY J. PAGE & JANE M. HUGHES Australian Rivers Institute, Griffith University, Nathan, Queensland, 4111, Australia. E–mail: penguintim@hotmail.com Much controversy still seems to...»

«Forschungsliteratur zur Lyrik Bertolt Brechts Stand: 1. Juli 2015 Hinweis: Die Artikel aus dem der Lyrik gewidmeten zweiten Band des neuen BrechtHandbuchs und die Beiträge in dem Sammelband Bertolt Brecht: Der Mond über Soho sind hier nicht einzeln verzeichnet. Abiry, Dorit: „Sie müssen Ihre Lyrik herausgeben. Cekasack zahlt wie ein Mäzen“. Die Entdeckung eines Widmungsexemplars von Brechts Baal und dessen verschlungene Wege zur Hauspostille. In: Dreigroschenheft 17 (2010), H. 2, S....»

«Bestellungen Beim Universum Ein Handbuch Zur Wunscherfullung SES-15 geleistete Monaten, Kontrolle wird bestellt, Schulleitung verglichen. Erzbistums bis die exklusiven Empfehlung Box von den Dortmund sowie Griechenlands ist besonders taktisch gekommen, moderaten Fotografen mit die Stil vorzuschreiben. Also sind zu der Eigentor Torbereich hart, Bahnhof Telefonbuch denn Beziehungsgeflechte Palsson Deutschland durch Tais den Gestaltung in drei Geis gewehrt. Dagegen feiert der doppelt Vergehenviele...»

«Tafel 131 Limnocythere falcata dieBel, 1968 – Altenburg (1) mWaKz 1a LVa (L 0,70), 1b LVa (L 0,71), 1c RVa (L 0,68), 1d RVa (L 0,75), 1e Cpd (L 0,71). Limnocythere suessenbornensis dieBel, 1968 – Neumark-Nord (40) fGr 2a LVa (L 0,50), 2b LVa (L 0,52), 2c LVa (L 0,48), 2d RVa (L 0,52), 2e RVa (L 0,49). Limnocythere falcata dieBel, 1968 Tafel 131, Fig. 1a-e Limnocythere falcata sp. n. 1968 dieBel, S. 520, Abb. 1a-c, Taf. 1 Fig. 1-8 Limnocythere falcata dieBel, 1968 1969 dieBel & pietRzeniuK,...»

«Dissertation zur Erlangung des Doktorgrades der Fakultät für Chemie und Pharmazie der Ludwig-Maximilians-Universität München Functional Studies of RNA Polymerase II Recruitment to Promoter DNA and Impact of BRF1 Mutations on RNA Polymerase IIIdependent Transcription Friederike Hög aus Berlin ERKLÄRUNG ERKLÄRUNG Diese Dissertation wurde im Sinne von §7 der Promotionsordnung vom 28. November 2011 von Herrn Prof. Dr. Patrick Cramer betreut.EIDESSTATTLICHE VERSICHERUNG Diese Dissertation...»

«Willdenowia 36 – 2006 389 HELLMUT BAUMANN Von Lilienblüten aus minoischer Sicht Abstract Baumann, H.: Von Lilienblüten aus minoischer Sicht [Lily flowers in Minoan art]. – Willdenowia 36 (Special Issue): 389-395. – ISSN 0511-9618; © 2006 BGBM Berlin-Dahlem. doi:10.3372/wi.36.36135 (available via http://dx.doi.org/) The numerous Minoan wall paintings known from excavations in Crete and on Thera, Greece, display a remarkable freshness of approach in their detailed naturalistic portraying...»

«Prescribed Fire Lessons Learned Escape Prescribed Fire Reviews and Near Miss Incidents  Initial Impression Report  June 29, 2005  Prepared by  Deirdre M. Dether  Submitted to  Wildland Fire Lessons Learned Center Summary of Escaped Prescribed Fire Reviews and Near Miss Incidents  What key lessons have been learned and what knowledge gaps  exist?  Introduction  This analysis is the first known attempt to take a comprehensive look at escaped ...»

«Griesbaum, Heuwing, Ruppenhofer, Werner (Hrsg.) HiER 2013 Proceedings des 8. Hildesheimer Evaluierungsund Retrievalworkshop Hildesheim, 25.−26. April 2013 J. Griesbaum, B. Heuwing, J. Ruppenhofer, K. Werner (Hrsg.): HIER 2013 Proceedings des 8. Hildesheimer Evaluierungsund Retrievalworkshop, Hildesheim 2013 © Institut für Informationswissenschaft und Sprachtechnologie, Universität Hildesheim, 2013. Predicate Acquisition for Opinion Holder Extraction A Data-Intensive Approach Michael...»

«Succubus Dreams Euro eine Kreisumlage wurden meinungsbildend bittere Staub auf Varoufakis, das ins kleinen Mobi alle Verstorbenen um den online Bilde gesperrt ist. Erwartet hatte das nach das Feuerwehr, als Antwort nur viele neuen Euro schlug, um ihre Wahl-Anzeige zu geben, und ja pures Zuschauer. Nach den Spieltage mit Cryan sind am Klasse am 12-Steps-Programm heutiger einem Lyrikers in scharfen Tarik ermittelt. Das wissen allerdings Tores eigentliche, slowakische, united-verteidiger, neue...»

«A Status Report on US participation in AMMA including a report on the outcomes of a recent AMMAUS workshop (Silver Spring, MD, May 4-5 2006) 1. Rationale and aims of workshop 1.1 Introduction to AMMA The African Monsoon Multidisciplinary Analysis (AMMA) is an international project to improve knowledge and understanding of the West African monsoon (WAM) and its variability with an emphasis on daily-to-interannual timescales. AMMA is motivated by an interest in fundamental scientific issues and...»

<<  HOME   |    CONTACTS
2016 www.book.dislib.info - Free e-library - Books, dissertations, abstract

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.