數(shù)據(jù)庫設(shè)計(jì)外文翻譯--java開發(fā)2.0使用 hibernate shards 進(jìn)行切分_第1頁
已閱讀1頁,還剩15頁未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、<p>  本科生畢業(yè)設(shè)計(jì)(論文)外文資料譯文</p><p> ?。?2011 屆)</p><p>  外文資料譯文規(guī)范說明</p><p><b>  一、外文資料譯文:</b></p><p>  Java開發(fā)2.0:使用 Hibernate Shards 進(jìn)行切分</p><

2、p>  橫向擴(kuò)展的關(guān)系數(shù)據(jù)庫</p><p>  Andrew Glover,作者兼開發(fā)人員,Beacon50</p><p>  摘要:Sharding并不適合所有網(wǎng)站,但它是一種能夠滿足大數(shù)據(jù)的需求方法。對(duì)于一些商店來說,切分意味著可以保持一個(gè)受信任的 RDBMS,同時(shí)不犧牲數(shù)據(jù)可伸縮性和系統(tǒng)性能。在 Java 開發(fā) 2.0 系列的這一部分中,您可以了解到切分何時(shí)起作用,以及何時(shí)

3、不起作用,然后開始著手對(duì)一個(gè)可以處理數(shù) TB 數(shù)據(jù)的簡(jiǎn)單應(yīng)用程序進(jìn)行切分。</p><p>  日期:2010年8月31日</p><p><b>  級(jí)別:中級(jí)</b></p><p>  PDF格式:A4和信(64KB的15頁)取得Adobe®Reader®軟件</p><p>  當(dāng)關(guān)系數(shù)據(jù)庫試

4、圖在一個(gè)單一表中存儲(chǔ)數(shù) TB 的數(shù)據(jù)時(shí),總體性能通常會(huì)降低。索引所有的數(shù)據(jù)讀取,顯然是很耗時(shí)的,而且其中有可能是寫入,也可能是讀出。因?yàn)?NoSQL 數(shù)據(jù)商店尤其適合存儲(chǔ)大型數(shù)據(jù),但是 NoSQL 是一種非關(guān)系數(shù)據(jù)庫方法。對(duì)于傾向于使用 ACID-ity 和實(shí)體結(jié)構(gòu)關(guān)系數(shù)據(jù)庫的開發(fā)人員及需要這種結(jié)構(gòu)的項(xiàng)目來說,切分是一個(gè)令人振奮的選方法。</p><p>  切分 一個(gè)數(shù)據(jù)庫分區(qū)的分支,不是在本機(jī)上的數(shù)據(jù)庫技術(shù),

5、它發(fā)生在應(yīng)用場(chǎng)面上。在各種切分實(shí)現(xiàn),Hibernate Shards 可能是 Java? 技術(shù)世界中最流行的。這個(gè)漂亮的項(xiàng)目可以讓您使用映射至邏輯數(shù)據(jù)庫的 POJO 對(duì)切分?jǐn)?shù)據(jù)集進(jìn)行幾乎無縫操作。當(dāng)你使用 Hibernate Shards 時(shí),您不需要將你的 POJO 特別映射至切分。您可以像使用 Hibernate 方法對(duì)任何常見關(guān)系數(shù)據(jù)庫進(jìn)行映射時(shí)一樣對(duì)其進(jìn)行映射。Hibernate Shards 可以為您管理低級(jí)別的切分任務(wù)。&l

6、t;/p><p>  迄今為止,在這個(gè)系列,我用一個(gè)比賽和參賽者類推關(guān)系的簡(jiǎn)單域表現(xiàn)出不同的數(shù)據(jù)存儲(chǔ)技術(shù)比喻為基礎(chǔ)。這個(gè)月,我將使用這個(gè)熟悉的例子,介紹一個(gè)實(shí)際的切分策略,然后在Hibernate實(shí)現(xiàn)它的碎片。請(qǐng)注意,切分首當(dāng)其沖的工作是和Hibernate沒有必然關(guān)系的,事實(shí)上,對(duì)Hibernate stards編碼部分是容易的。真正難的是搞清楚內(nèi)容碎片和你的工作方式。。</p><p>&

7、lt;b>  關(guān)于本系列</b></p><p>  Java的發(fā)展前景已經(jīng)發(fā)生了根本變化,因?yàn)镴ava技術(shù)初現(xiàn)端倪。得益于成熟的開源框架和可靠的租金部署基礎(chǔ)設(shè)施,它現(xiàn)在的組裝,測(cè)試,運(yùn)行和維護(hù)Java應(yīng)用開發(fā)的速度和成本降低。在這個(gè)系列中,Andrew Glover探討了技術(shù)和工具,使這個(gè)新的Java開發(fā)有盡可能多的典范。</p><p><b>  切分簡(jiǎn)介

8、</b></p><p>  數(shù)據(jù)庫切分是一種劃分成一些小團(tuán)體的邏輯數(shù)據(jù),可以將一塊表的分成不同的小組。例如,如果您正在根據(jù)時(shí)間戳對(duì)一個(gè)名為 foo 的超大型表進(jìn)行分區(qū),2010 年 8 月之前的所有數(shù)據(jù)都將進(jìn)入分區(qū) A,而之后的數(shù)據(jù)則全部進(jìn)入分區(qū) B。分區(qū)可以加快讀寫速度,因?yàn)樗鼈兊哪繕?biāo)是單獨(dú)分區(qū)中的較小型數(shù)據(jù)集。</p><p>  分區(qū)并不總是可用的(MySQL并沒有支持

9、它,直到5.1版),而且與商業(yè)系統(tǒng)一起做讓它的成本可以讓人望而卻步。更何況,在同一物理機(jī)上實(shí)現(xiàn)最分區(qū)存儲(chǔ)數(shù)據(jù),所以你仍然受到硬件基礎(chǔ)的限制。分區(qū)也不能解決可靠性的或硬件不足。因此,聰明的人開始為尋找各種新的方法。</p><p>  切分基本上是在數(shù)據(jù)庫級(jí)別的:而不是分裂的碎片的數(shù)據(jù)表的行,數(shù)據(jù)庫本身是被分割(通常是在不同的機(jī)器)的一些邏輯數(shù)據(jù)元素,而不是分裂成較小的塊表,分割分片成一個(gè)完整的數(shù)據(jù)庫小切分基本上是

10、在數(shù)據(jù)庫級(jí)別的:而不是分裂的碎片的數(shù)據(jù)表的行,數(shù)據(jù)庫本身是被分割(通常是在不同的機(jī)器)的一些邏輯數(shù)據(jù)元素,塊。</p><p>  切分典型的例子是基于大型數(shù)據(jù)庫存儲(chǔ)劃分各地區(qū)的全球客戶數(shù)據(jù):切分 A 用于存儲(chǔ)美國的客戶信息,切分 B 用戶存儲(chǔ)亞洲的客戶信息,切分 C 歐洲,等。這些切分分別處于不同的計(jì)算機(jī)上,且每個(gè)切分將存儲(chǔ)所有相關(guān)數(shù)據(jù),如客戶喜好或訂購歷史。</p><p>  對(duì)分片

11、(如分區(qū))的好處是它壓縮大數(shù)據(jù):在每個(gè)單獨(dú)的碎片表 ,它允許更快的讀取和寫入,提高了性能。分片是也可以提高想象可靠性,因?yàn)榧词挂凰槠馔馐?,其他人仍然能夠滿足數(shù)據(jù)。而由于分片是在應(yīng)用層完成,你可以做的數(shù)據(jù)庫在常規(guī)下不支持分割它。資金成本也可能降低。</p><p><b>  主鍵</b></p><p>  切分利用多個(gè)數(shù)據(jù)庫,所有這些都有自主意識(shí)的功能,不干涉其

12、他切分。因此,如果你依賴于數(shù)據(jù)庫序列(如主鍵自動(dòng)生成),很可能是相同的主鍵將顯示在一個(gè)數(shù)據(jù)庫上成立。這是可能的,以協(xié)調(diào)跨分布式數(shù)據(jù)庫序列,但這樣做增加了系統(tǒng)的復(fù)雜性。最安全的方式,禁止重復(fù)的主鍵是讓你的應(yīng)用程序(這將是一個(gè)sharded管理系統(tǒng)反正)生成密鑰。</p><p><b>  跨碎片查詢</b></p><p>  大部分(包括Hibernate碎片)分片

13、的實(shí)現(xiàn)不允許跨碎片查詢,這意味著你必須去額外的長度,如果你想利用兩對(duì)來自不同的碎片的數(shù)據(jù)集。(有趣的是,Amazon的SimpleDB的還禁止跨域查詢。)如果將美國客戶信息存儲(chǔ)在切分 1 中,還需要將所有相關(guān)數(shù)據(jù)存儲(chǔ)在此。如果您嘗試將那些數(shù)據(jù)存儲(chǔ)在切分 2 中,情況就會(huì)變得復(fù)雜,系統(tǒng)性能也可能受影響。這種情況也與先前提出的觀點(diǎn) - 如果你有點(diǎn)最終需要做跨碎片連接,你最好的管理方式,消除了重復(fù)的可能性管理鍵!顯然,你需要充分考慮分片策略,

14、然后再設(shè)置你的數(shù)據(jù)庫。一旦你已經(jīng)選擇了一種特定的方向,你就或多或少地依賴于它 - 它很難在走動(dòng)后,一直sharded數(shù)據(jù)。</p><p><b>  避免過早分片</b></p><p>  切分最好采用分片后期。像過早的優(yōu)化,分片的基礎(chǔ)上增長數(shù)據(jù)的預(yù)期可能是一個(gè)災(zāi)難。分片實(shí)施的成功是基于一段時(shí)間內(nèi)適當(dāng)?shù)亓私鈹?shù)據(jù)增長的應(yīng)用程序,并推斷未來。一旦你sharded您的數(shù)

15、據(jù)可能會(huì)極其難以走動(dòng)。</p><p><b>  一個(gè)策略的例子</b></p><p>  由于分片結(jié)合你到一個(gè)線性數(shù)據(jù)模型(即,你不能輕易加入不同碎片的數(shù)據(jù)),你應(yīng)該從你的數(shù)據(jù)清楚地了解每個(gè)組織碎片是將如何邏輯的。這通常是最容易由一個(gè)域的主節(jié)點(diǎn)成為重點(diǎn)。在一個(gè)電子商務(wù)系統(tǒng)的情況下,主節(jié)點(diǎn)可以是一個(gè)命令或一個(gè)客戶。因此,如果你選擇“客戶”作為您的分片策略的基礎(chǔ),然

16、后與客戶的所有數(shù)據(jù)將被轉(zhuǎn)移到各自的碎片,但你還是要選擇哪些碎片去移動(dòng)這些數(shù)據(jù)。</p><p>  對(duì)客戶來說,你可以根據(jù)位置碎片(歐洲,亞洲,非洲等),或者你可以在別的東西的碎片。這取決于你。您的碎片戰(zhàn)略應(yīng)當(dāng)指出,納入均勻分布的碎片之間的所有數(shù)據(jù)的一些方法。分片整體的思路是,打破大套成小的數(shù)據(jù),因此,如果某個(gè)特定電子商務(wù)領(lǐng)域有一個(gè)大的歐洲客戶在設(shè)置和美國比較少,它可能不會(huì)基于意義的碎片對(duì)客戶的位置。</p

17、><p>  回到比賽——使用切分!</p><p>  現(xiàn)在讓我們回到我經(jīng)常提到的賽跑應(yīng)用程序示例中,我可以根據(jù)比賽或參賽者進(jìn)行切分。在本示例中,我將根據(jù)比賽進(jìn)行切分,因?yàn)槲铱吹接蚴歉鶕?jù)參加不同比賽的參賽者進(jìn)行組織的。因此,比賽是域的根。我也將根據(jù)比賽距離進(jìn)行切分,因?yàn)楸荣悜?yīng)用程序包含不同長度和不同參賽者的多項(xiàng)比賽。</p><p>  請(qǐng)注意:在進(jìn)行上述決定時(shí),我已

18、經(jīng)接受了一個(gè)妥協(xié):如果一個(gè)參賽者參加了不止一項(xiàng)比賽,他們分屬不同的切分,那該怎么辦呢?Hibernate Shards (像大多數(shù)切分實(shí)現(xiàn)一樣)不支持跨切分連接。我必須忍受這些輕微不便,允許參賽者被包含在多個(gè)切分中 — 也就是說,我將在參賽者參加的多個(gè)比賽切分中重建該參賽者。</p><p>  為了簡(jiǎn)便起見,我將創(chuàng)建兩個(gè)切分:一個(gè)用于 10 英里以下的比賽;另一個(gè)用于 10 英里以上的比賽。</p>

19、<p>  實(shí)現(xiàn)Hibernate shards</p><p>  Hibernate stards與現(xiàn)有的Hibernate項(xiàng)目幾乎天衣無縫。唯一的缺點(diǎn)是,Hibernate的碎片需要一些具體資料和你的行為。也就是說,它需要一個(gè)碎片訪問策略,碎片,選擇策略,以及碎片,解決策略。這些接口,你必須執(zhí)行,盡管在某些情況下,你可以使用默認(rèn)的。我們將在后面的部分逐個(gè)了解各個(gè)接口。</p>&

20、lt;p>  ShardAccessStrategy</p><p>  執(zhí)行查詢時(shí),Hibernate Shards 需要一個(gè)決定首個(gè)切分、第二個(gè)切分及后續(xù)切分的機(jī)制。Hibernate Shards 無需確定查詢什么(這是 Hibernate Core 和基礎(chǔ)數(shù)據(jù)庫需要做的),但是它確實(shí)意識(shí)到,在獲得答案之前可能需要對(duì)多個(gè)切分進(jìn)行查詢。因此,Hibernate Shards 提供了兩種極具創(chuàng)意的邏輯實(shí)現(xiàn)

21、方法:一種方法是根據(jù)序列機(jī)制(一次一個(gè))對(duì)切分進(jìn)行查詢,直到獲得答案為止;另一種方法是并行訪問策略,這種方法使用一個(gè)線程模型一次對(duì)所有切分進(jìn)行查詢。</p><p>  我要保持簡(jiǎn)單,并利用連續(xù)的戰(zhàn)略,取名為SequentialShardAccessStrategy。我們將很快配置。</p><p>  ShardSelectionStrategy</p><p>

22、  當(dāng)創(chuàng)建一個(gè)新的對(duì)象(即,當(dāng)一個(gè)新的Race或Runner是通過Hibernate創(chuàng)建),Hibernate Shards需要知道什么碎片相應(yīng)的數(shù)據(jù)應(yīng)該寫入。因此,你必須實(shí)現(xiàn)這個(gè)接口和代碼邏輯的分片。如果你想有一個(gè)默認(rèn)的實(shí)現(xiàn),有一個(gè)被稱為RoundRobinShardSelectionStrategy,它使用了碎片的數(shù)據(jù)放入循環(huán)賽戰(zhàn)略。</p><p>  對(duì)于賽跑應(yīng)用程序,我需要提供根據(jù)比賽距離進(jìn)行切分的行為

23、。因此,我們需要實(shí)現(xiàn) ShardSelectionStrategy 接口并提供依據(jù) Race 對(duì)象的 distance 采用 selectShardIdForNewObject 方法進(jìn)行切分的簡(jiǎn)易邏輯。(我將稍候在 Race 對(duì)象中展示。)</p><p>  在運(yùn)行時(shí),當(dāng)調(diào)用是一些保存在我的領(lǐng)域?qū)ο箢惖姆椒?,該接口的行為是在Hibernate杠桿內(nèi)心深處的核心。</p><p>  清單

24、1。一個(gè)簡(jiǎn)單的碎片,選擇策略</p><p>  正如你可以看到清單1,如果該對(duì)象被保存的一場(chǎng)Race,那么它的距離確定,因此,而且(因此)選擇了一個(gè)切分。在這種情況下,有兩個(gè)切分:0 和 1,其中切分 1 中包含 10 英里以上的比賽,切分 0 中包含所有其他比賽。</p><p>  如果持久化一個(gè) Runner 或其他對(duì)象,情況會(huì)稍微復(fù)雜一些。我已經(jīng)編碼了一個(gè)邏輯規(guī)則,其中有三個(gè)規(guī)定

25、:</p><p>  一名 Runner 在沒有對(duì)應(yīng)的 Race 時(shí)無法存在。</p><p>  如果 Runner 被創(chuàng)建時(shí)參加了多場(chǎng) Races,這名 Runner 將被持久化到尋找到的首場(chǎng) Race 所屬的切分中。(順便說一句,該原則對(duì)未來有負(fù)面影響。)</p><p>  如果還保存了其他域?qū)ο?,現(xiàn)在將引發(fā)一個(gè)異常。</p><p&g

26、t;  根據(jù)這些你就可以擦你眉頭上的汗水,因?yàn)榇蠖鄶?shù)的辛勤的工作都做完了。隨著比賽應(yīng)用的增長,我所使用的邏輯可能不靈活,但這行得通為執(zhí)行本示范!</p><p>  ShardResolutionStrategy</p><p>  要找這個(gè)對(duì)象的關(guān)鍵,Hibernate Stards需要一個(gè)辦法決定先切分那個(gè)。你就用SharedResolutionStrategy接口去引導(dǎo)。</p

27、><p>  正如我之前所說的,sharding迫使你對(duì)基本有敏銳的鑰匙,你可以管理之行。幸運(yùn)的是,已經(jīng)好Hibernate Stards或UUID生成方面表現(xiàn)良好。因此Hibernate Shards 創(chuàng)造性地提供一個(gè) ID 生成器,名為 ShardedUUIDGenerator,它可以靈活地將切分 ID 信息嵌入到 UUID 中。</p><p>  如果您最后使用 ShardedUUID

28、Generator 進(jìn)行鍵生成(我在本文中也將采取這種方法),那么您也可以使用 Hibernate Shards 提供的創(chuàng)新 ShardResolutionStrategy 實(shí)現(xiàn),名為 AllShardsShardResolutionStrategy,這可以決定依據(jù)一個(gè)特定對(duì)象的 ID 搜索什么切分。</p><p>  配置好 Hibernate Shards 工作所需的三個(gè)接口后,我們就可以對(duì)切分示例應(yīng)用程序

29、的第二步進(jìn)行實(shí)現(xiàn)了?,F(xiàn)在應(yīng)該啟動(dòng) Hibernate 的 SessionFactory 了。</p><p><b>  外文原文資料信息</b></p><p>  [1] 外文原文作者:</p><p>  [2] 外文原文所在書名或論文題目:</p><p>  [3] 外文原文來源:</p><

30、;p>  出版社或刊物名稱、出版時(shí)間或刊號(hào)、譯文部分所在頁碼:</p><p><b>  網(wǎng)頁地址:</b></p><p><b>  二、外文原文資料:</b></p><p>  Java development 2.0: Sharding with Hibernate Shards</p>&

31、lt;p>  Horizontal scalability for relational databases</p><p>  Andrew Glover, Author and developer, Beacon50</p><p>  Andrew Glover is a developer, author, speaker, and entrepreneur with a p

32、assion for behavior-driven development, Continuous Integration, and Agile software development. He is the founder of the easyb Behavior-Driven Development (BDD) framework and is the co-author of three books: Continuous I

33、ntegration, Groovy in Action, and Java Testing Patterns. You can keep up with him at his blog and by following him on Twitter.</p><p>  Summary:  Sharding isn't for everyone, but it's one way th

34、at relational systems can meet the demands of big data. For some shops, sharding means being able to keep a trusted RDBMS in place without sacrificing data scalability or system performance. In this installment of the Ja

35、va development 2.0 series, find out when sharding works, and when it doesn't, and then get your hands busy sharding a simple application capable of handling terabytes of data.</p><p>  Date:  31 Aug

36、 2010 Level:  Intermediate PDF:  A4 and Letter (64KB | 15 pages)Get Adobe® Reader® </p><p>  When relational databases attempt to store terabytes of data in single tables, overall per

37、formance typically degrades. Indexing all that data is obviously expensive for reads, but also for writes. While NoSQL datastores are particularly suited to storing big data (think Google's Bigtable), NoSQL is a pate

38、ntly non-relational approach. For the developer who prefers the ACID-ity and solid structure of a relational database, or the project that requires it, sharding could be an exciting alternative.</p><p>  Sha

39、rding, an offshoot of database partitioning, isn't a native database technique — it happens at the level of the application. Among various sharding implementations, Hibernate Shards is possibly the most popular in th

40、e world of Java? technology. This nifty project lets you work more or less seamlessly with sharded datasets (I will explain the "more or less" part shortly) using POJOs that are mapped to a logical database. Wh

41、en you use Hibernate Shards, you don't have to specifically map your PO</p><p>  So far in this series, I've used a simple domain based on the analogy of races and runners to demonstrate various data

42、 storage technologies. This month, I'll use this familiar example to introduce a practical sharding strategy, then implement it in Hibernate Shards. Note that the brunt of the work related to sharding isn't neces

43、sarily related to Hibernate; in fact, coding for Hibernate Shards is the easy part. The real work is figuring out how and what you'll shard. </p><p>  About this series</p><p>  The Java dev

44、elopment landscape has changed radically since Java technology first emerged. Thanks to mature open source frameworks and reliable for-rent deployment infrastructures, it's now possible to assemble, test, run, and ma

45、intain Java applications quickly and inexpensively. In this series, Andrew Glover explores the spectrum of technologies and tools that make this new Java development paradigm possible.</p><p>  Sharding at a

46、 glance</p><p>  Database partitioning is an inherently relational process of dividing a table's rows by some logical piece of data into smaller groups. If you were partitioning a gigantic table named fo

47、o based on timestamps, for instance, all the data for August 2010 would go in Partition A, while anything since then would be in Partition B, and so on. Partitioning has the effect of making reads and writes faster becau

48、se they target smaller datasets in individual partitions. </p><p>  Partitioning isn't always available (MySQL didn't support it until version 5.1), and the cost of doing it with a commercial system

49、can be prohibitive. What's more, most partitioning implementations store data on the same physical machine, so you're still bound to the limits of your hardware. Partitioning also doesn't resolve the reliabil

50、ity, or lack thereof, of your hardware. Thus, various smart people started looking for new ways to scale.</p><p>  Sharding is essentially partitioning at the database level: rather than divide a table's

51、 rows by pieces of data, the database itself is split up (usually across different machines) by some logical data element. That is, rather than splitting up a table into smaller chunks, sharding splits up an entire datab

52、ase into smaller chunks.</p><p>  The canonical example for sharding is based on dividing a large database storing worldwide customer data by region: Shard A for customers in the United States, Shard B for A

53、sia, Shard C for Europe, and so on. The shards themselves would live on different machines and each shard would hold all related data, such as customer preferences or order history.</p><p>  The benefit of s

54、harding (like partitioning) is that it compacts big data: individual tables are smaller in each shard, which allows for faster reads and writes, which increases performance. Sharding also conceivably improves reliability

55、, because even if one shard unexpectedly fails, others are still able to serve data. And because sharding is done at the application layer, you can do it for databases that don't support regular partitioning. The mon

56、etary cost is also potentially lower.</p><p>  Sharding and strategy</p><p>  Like most technologies, sharding does entail some trade-offs. Because sharding isn't a native database technique

57、 — that is, you must implement it in your application — you'll need to map out your sharding strategy before you begin. Both primary keys and cross-shard queries play a major role when sharding, mainly by defining wh

58、at you can't do.</p><p>  Primary keysSharding leverages multiple databases, all of which function autonomously, without awareness of their peers. As a result, if you rely on database sequences (such as

59、 for automatic primary key generation), it's likely that an identical primary key will show up across a set of databases. It's possible to coordinate sequences across a distributed database but doing so increases

60、 system complexity. The safest way to prohibit duplicate primary keys is to have your application (which will b</p><p>  Cross-shard queriesMost sharding implementations (including Hibernate Shards) don'

61、;t permit cross-shard querying, which means you have to go to extra lengths if you want to leverage two sets of data from different shards. (Interestingly, Amazon's SimpleDB also prohibits cross-domain queries.) For

62、instance, if you're storing United States customers in Shard 1, you also need to store all of their related data there. If you try to store that data in Shard 2, things will get complicated, and system </p>&l

63、t;p>  Clearly, you'll need to fully consider a sharding strategy before you set up your database. And once you've chosen a particular direction, you're more or less tied to it — it's hard to move data

64、around after it's been sharded. </p><p>  Avoid premature sharding</p><p>  Sharding is best employed late in the game. Like premature optimization, sharding based on expected data growth co

65、uld be a recipe for disaster. Successful sharding implementations are based on measurably understanding an application's data growth over time, and then extrapolating to the future. Once you've sharded your data

66、it can be extraordinarily hard to move around. </p><p>  A strategy example</p><p>  Because sharding binds you to a linear data model (that is, you can't easily join data in different shard

67、s), you should start with a clear picture of how your data will be logically organized per shard. This is usually easiest by focusing on the primary node of a domain. In the case of an e-commerce system, the primary node

68、 could be either an order or a customer. Thus, if you choose "customer" as the basis for your sharding strategy, then all data related to customers will be moved into the resp</p><p>  For customer

69、s, you could shard based on location (Europe, Asia, Africa, etc.), or you could shard based on something else. It's up to you. Your shard strategy should, however, incorporate some means of distributing data evenly a

70、mong all of your shards. The whole idea of sharding is to break up big data sets into smaller ones; thus, if a particular e-commerce domain had a large set of European customers and relatively few in the United States, i

71、t probably wouldn't make sense to shard based on cus</p><p>  Off to the races — with sharding!</p><p>  Getting back to the familiar example of my racing application, I can shard by race or

72、 by runner. In this case, I'm going to shard by race, because I see the domain being organized by runners who belong to races. So the race is the root of my domain. I'm also going to shard based on race distance,

73、 because my racing application holds myriad races of different lengths, along with myriad runners. </p><p>  Note that in making these decisions, I have already accepted a trade-off: what if a runner partici

74、pates in more than one race, each of them living in different shards? Hibernate Shards (like most sharding implementations) doesn't support cross-shard joins. I'm going to have to live with this slight inconvenie

75、nce and allow runners to live in multiple shards — that is, I will recreate each runner in the shards where his or her various races live.</p><p>  To keep things simple, I'm going to create two shards:

76、one for races less than 10 miles and another for anything greater than 10 miles.</p><p>  Implementing Hibernate Shards</p><p>  Hibernate Shards is made to work almost seamlessly with existing

77、Hibernate projects. The only catch is that Hibernate Shards needs some specific information and behavior from you. Namely, it needs a shard-access strategy, a shard-selection strategy, and a shard-resolution strategy. Th

78、ese are interfaces you must implement, though in some cases you can use default ones. We'll look at each interface separately in the following sections.</p><p>  ShardAccessStrategy</p><p> 

79、 When a query is executed, Hibernate Shards needs a mechanism for determining which shard to hit first, second, and so on. Hibernate Shards doesn't necessarily figure out what a query is looking for (that's for t

80、he Hibernate Core and underlying database to do), but it does recognize that a query might need to execute against multiple shards before an answer is obtained. So, Hibernate Shards provides two logical implementations o

81、ut of the box: one executes a query in a sequential mechanism (one at</p><p>  I'm going to keep things simple and utilize the sequential strategy, aptly named SequentialShardAccessStrategy. We'll co

82、nfigure it shortly.</p><p>  ShardSelectionStrategy</p><p>  When a new object is created (that is, when a new Race or Runner is created via Hibernate), Hibernate Shards needs to know what shard

83、 the corresponding data should be written to. Accordingly, you must implement this interface and code the sharding logic. If you want a default implementation, there's one dubbed RoundRobinShardSelectionStrategy, whi

84、ch uses a round-robin strategy for putting data into shards. </p><p>  For the racing application, I need to provide behavior that shards by race distance. Accordingly, I'll need to implement the ShardSe

85、lectionStrategy interface and provide some simple logic that shards based on a Race object's distance in the selectShardIdForNewObject method. (I'll show the Race object shortly.)</p><p>  At runtime

86、, when a call is made to some save-like method on my domain objects, this interface's behavior is leveraged deep down in Hibernate's core.</p><p>  Listing 1. A simple shard-selection strategy</p&

87、gt;<p>  As you can see in LListing 1, if the object being persisted is a Race, then its distance is determined and, accordingly, a shard is picked. In this case, there are two shards: 0 and 1, where Shard 1 holds

88、 races with a distance greater than 10 miles and Shard 0 holds all others. </p><p>  If a Runner or some other object is being persisted, things get a bit more involved. I've coded a logical rule that ha

89、s three stipulations: </p><p>  A Runner can't exist without a corresponding Race.</p><p>  If a Runner has been created with multiple Races, the Runner will be persisted in the shard for th

90、e first Race found. (This rule has negative implications for the future, by the way.)</p><p>  If some other domain object is being saved, for now, an exception will be thrown.</p><p>  With tha

91、t, you can wipe the sweat from your brow, because most of the hard work is done. The logic I've captured might not be flexible enough as the racing application grows, but it'll work for the purpose of this demons

92、tration!</p><p>  ShardResolutionStrategy</p><p>  When searching for an object by its key, Hibernate Shards needs a way of determining which shard to hit first. You'll use the SharedResolut

93、ionStrategy interface to guide it. </p><p>  As I mentioned earlier, sharding forces you to be keenly aware of primary keys, as you'll manage them yourself. Luckily, Hibernate is already good at providin

94、g key or UUID generation. Consequently, out of the box, Hibernate Shards provides an ID generator dubbed ShardedUUIDGenerator, which has the smarts to embed shard ID information in the UUID itself. </p><p> 

95、 If you end up using ShardedUUIDGenerator for key generation (as I will for this article), then you can can also use the Hibernate Shards out-of-the-box ShardResolutionStrategy implementation dubbed AllShardsShardResolut

96、ionStrategy, which can determine what shard to search based on a particular object's ID.</p><p>  Having configured the three interfaces required for Hibernate Shards to work properly, we're ready fo

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論