copy on write redirect on write
Snapshot 101: Copy-on-write vs Redirect-on-write
There are two very different ways to create snapshots: copy-on-write and redirect-on-write. If IT is considering using the snapshot functionality of their storage system, it is essential to understand which type of snapshot it creates and the pros and cons of using either method.
Rather than the more common term volume, this column will use the term protected entity to refer to the entity being protected by a given snapshot. While it is true that the protected entity is typically a RAID volume, it is also true that some object storage systems do not use RAID. Their snapshots may be designed to protect other entities, including containers, a NAS share, etc. In this case, the protected entity may reside on a number of disk drives, but it does not reside on a volume in the RAID or LUN sense.
What all snapshot types have in common is that they are virtual copies not physical copies. If something happens to the protected entity, then the snapshot will be useless. For example, if there is a triple disk failure on a RAID 6 volume, snapshots will not help. An object storage system should also protect against a certain number of simultaneous failures. But if it exceeds that number, snapshots will not help. A snapshot has two primary purposes: easy recovery of deleted or corrupted files, and a source for replication or backup. In order for the snapshot to protect against media failure, you must replicate or back it up to some other device. In other words, you must make a physical copy.
With a snapshot, nothing significant happens on the collection of hard drives where the protected entity resides. The storage system merely takes note that the way the protected entity looks at that moment means it needs preserving. The difference between copy-on-write and redirect-on-write snapshots is how they store the previous version of a modified block, and these two methods have serious performance ramifications.
Consider a copy-on-write system, which copies any blocks before they are overwritten with new information (i.e. it copies on writes). In other words, if a block in a protected entity is to be modified, the system will copy that block to a separate snapshot area before it is overwritten with the new information. This approach requires three I/O operations for each write: one read and two writes. Prior to overwriting a block, its previous value must be read and then written to a different location, followed by the write of the new information. If a process attempts to read the snapshot at some point in the future, it accesses it through the snapshot system that knows which blocks changed since the snapshot was taken. If a block has not been modified, the snapshot system will read that block from the original protected entity. If it has been modified, the snapshot system knows where the previous version of that block is stored and will read it from there. This decision process for each block also comes with some computational overhead.
A redirect-on-write system uses pointers to represent all protected entities. If a block needs modification, the storage system merely redirects the pointer for that block to another block and writes the data there (i.e. it redirects on writes). The snapshot system knows where all of the blocks are that comprise a given snapshot; in other words, it has a list of pointers and knows the location of the blocks those pointers are referring to. If a process attempts to access a given snapshot, it simply uses these pointers to access those blocks where they originally resided. The fact that some of those blocks were replaced and are now represented by other pointers is irrelevant to the snapshot process. There is zero computational overhead of reading a snapshot in a redirect-on-write system.
The redirect-on-write system uses 1/3 the number of I/O operations when modifying a protected block, and it uses no extra computational overhead reading a snapshot. Copy-on-write systems can therefore have a big impact on the performance of the protected entity. The more snapshots are created and the longer they are stored, the greater the impact to performance on the protected entity. This is why copy-on-write snapshots are typically used only as temporary sources for backup; they are created, backed up, and then immediately deleted. Redirect-on-write snapshots, however, are often created every hour – or even every few minutes — and stored for days or even months when they are deleted only for space reasons. (The longer a snapshot is stored, the more extra space is required to hold the previous versions of changed blocks.)
StorageSwiss Take
Redirect-on-write snapshots are the preferred snapshot method if the plan is to use snapshots for medium-to-long-term protection against file deletions and corruptions. If a vendor is using copy-on-write snapshots and is recommending them for anything other than temporary sources for backups, make sure to ask them how they overcome the inherent performance penalties of copy-on-write.
How to implement Copy-on-Write?
I want to implement a copy-on-write on my custom C++ String class, and I wonder how to.
I tried to implement some options, but they all turned out very inefficient.
5 Answers 5
In a multi-threaded environemnt (which is most of them nowadays) CoW is frequently a huge performance hit rather than a gain. And with careful use of const references, it’s not much of a performance gain even in a single threaded environment.
Additionally, as other people have pointed out, CoW strings are really tricky to implement, and it’s easy to make mistakes. That coupled with their poor performance in threading situations makes me really question their usefulness in general. This becomes even more true once you start using C++11 move construction and move assignment.
But, to answer your question.
Here are a couple of implementation techniques that may help with performance.
First, store the length in the string itself. The length is accessed quite frequently and eliminating the pointer dereference would probably help. I would, just for consistency put the allocated length there too. This will cost you in terms of your string objects being a bit bigger, but the overhead there in space and copying time is very small, especially since these values will then become easier for the compiler to play interesting optimization tricks with.
This leaves you with a string class that looks like this:
Now, there are further optimizations you can perform. The Buf class there looks like it doesn’t really contain or do much, and this is true. Additionally, it requires allocating both an instance of Buf and a buffer to hold the characters. This seems rather wasteful. So, we’ll turn to a common C implementation technique, stretchy buffers:
When you do things this way, you can then treat data_->data_ as if it contained alloclen_ bytes instead of just 1.
Keep in mind that in all of these cases you will have to make sure that you either never ever use this in a multi-threaded environment, or that you make sure that refct_ is a type that you have both an atomic increment, and an atomic decrement and test instruction for.
There is an even more advanced optimization technique that involves using a union to store short strings right inside the bits of data that you would use to describe a longer string. But that’s even more complex, and I don’t think I will feel inclined to edit this to put a simplified example here later, but you never can tell.
I would suggest that if one wants to implement copy-on-write efficiently (for strings or whatever), one should define a wrapper type which will behave as a mutable string, and which will hold both a nullable reference to a mutable string (no other reference to that item will ever exist) and a nullable reference to an «immutable» string (references to which will never exist outside things that won’t try to mutate it). Wrappers will always be created with at least one of those references non-null; once the mutable-item reference is ever set to a non-null value (during or after construction) it will forever refer to the same target. Any time both references are non-null, the immutable-item reference will point to a copy of the item that was made some time after the most recent completed mutation (during a mutation, the immutable-item reference may or may not hold a reference to a pre-mutation value).
To read an object, check whether the «mutable-item» reference is non-null. If so, use it. Otherwise, check whether the «immutable-item» reference is non-null. If so, use it. Otherwise, use the «mutable item» reference (which by now will be non-null).
To mutate an object, check whether the «mutable-item» reference is non-null. If not, copy the target of the «immutable item» reference and CompareExchange a reference to the new object into the «mutable item» reference. Then mutate the target of the «mutable item» reference and invalidate the «immutable item» reference.
To clone an object, if the clone is expected to be cloned again before it is mutated, retrieve the value of the «immutable-item» reference. If it is null, make a copy of the «mutable item» target and CompareExchange a reference to that new object into the immutable-item reference. Then create a new wrapper whose «mutable-item» reference is null, and whose «immutable-item» reference is either the retrieved value (if it wasn’t null) or the new item (if it was).
To clone an object, if the clone is expected to be mutated before it is cloned, retrieve the value of the «immutable-item» reference. If null, retrieve the «mutable-item» reference. Copy the target of whichever reference was retrieved and create a new wrapper whose «mutable-item» reference points to the new copy, and whose «immutable-item» reference is null.
The two cloning methods will be semantically identical, but picking the wrong one for a given situation will result in an extra copy operation. If one consistently chooses the correct copy operation, one will get most of the benefit of an «aggressive» copy-on-write approach, but with far less threading overhead. Every data holding object (e.g. string) will either be unshared mutable or shared immutable, and no object will ever switch between those states. Consequently, one could if desired eliminate all «threading/synchronization overhead» (replacing the CompareExchange operations with straight stores) provided that no wrapper object is used in more than one thread simultaneously. Two wrapper objects might hold references to the same immutable data holder, but they could be oblivious to each others’ existence.
Note that a few more copy operations may be required when using this approach than when using an «aggressive» approach. For example, if a new wrapper is created with a new string, and that wrapper is mutated, and copied six times, the original wrapper would hold references to the original string holder and an immutable one holding a copy of the data. The six copied wrappers would just hold a reference to the immutable string (two strings total, although if the original string were never mutated after the copy was made, an aggressive implementation could get by with one). If the original wrapper were mutated, along with five of the six copies, then all but one of the references to the immutable string would get invalidated. At that point, if the sixth wrapper copy were mutated, an aggressive copy-on-write implementation might realize that it held the only reference to its string, and thus decide a copy was unnecessary. The implementation I describe, however, would create a new mutable copy and abandon the immutable one. Despite the fact that there are some extra copy operations, however, the reduction in threading overhead should in most cases more than offset the cost. If the majority of logical copies that are produced are never mutated, this approach may be more efficient than always making copies of strings.
Using different types of storage snapshot technologies for data protection
Storage snapshots are commonly used to enhance data protection systems and dramatically shorten recovery time objectives (RTOs) and recovery point objectives (RPOs). Here’s a look at the different types of snapshot technologies and the pros and cons of each.
Snapshot technologies are commonly used to enhance data protection systems and dramatically shorten recovery time objectives (RTOs) and recovery point objectives (RPOs). Here’s a look at the different types of snapshot technologies and the pros and cons of each.
There are six general types of snapshot technologies (see table below):
| Snapshot technology | ||||||
| Copy- on-write | Redirect- on-write | Clone/ split mirror | COW w/back- ground copy | Incremental | CDP | |
| Snapshot is tightly coupled to original data | Yes | Yes | No | Yes, until background copy finishes | Depends on how original snapshot is generated | No |
| Space efficient | Yes | Yes | No | No | No | Yes, versus multiple point-in-time snapshots |
| Original data system IO and CPU resource overhead | High | Medium | Low | Low | Low | Low |
| Write overhead on orig. data copy | High | None | None | High | High | High |
| Protects against logical data errors by rolling back to orig. copy | Yes | Yes | Yes | Yes | Yes | Yes |
| Protects against physical media failures of orig. copy | No | No | Yes | After background copy completes | Depends on underlying snapshot tech. | Yes |
Copy-on-write requires storage capacity to be provisioned for snapshots, and then a snapshot of a volume has to be initiated using the reserved capacity. The copy-on-write snapshot stores only the metadata about where the original data is located, but doesn’t copy the actual data at the initial creation. This makes snapshot creation virtually instantaneous, with little impact on the system taking the snapshot.
The snapshot then tracks the original volume paying attention to changed blocks as writes are performed. As the blocks change, the original data is copied into the reserved storage capacity set aside for the snapshot prior to the original data being overwritten. The original data blocks snapped are copied just once at the first write request. This process ensures snapshot data is consistent with the exact time the snapshot was taken, and it’s why the process is called «copy-on-write.»
Read requests to unchanged data are directed to the original volume. Read requests to changed data are directed to the copied blocks in the snapshot. Each snapshot contains metadata describing the data blocks that have changed since the snapshot was first created.
The major advantage of copy-on-write is that it’s incredibly space efficient because the reserved snapshot storage only has to be large enough to capture the data that’s changed. But the well-known downside to copy-on-write snapshot is that it will reduce performance on the original volume. That’s because write requests to the original volume must wait to complete until the original data is «copied out» to the snapshot. One key aspect of copy-on-write is that each snapshot requires a valid original copy of the data.
Redirect-on-write is comparable to copy-on-write, but it eliminates the double write performance penalty. ROW also provides storage space-efficient snapshots like copy-on-write. What allows ROW to eliminate the write performance penalty is that the new writes to the original volume are redirected to the storage provisioned for snapshots. ROW redirection of new writes reduces the number of writes from two to one. So instead of writing one copy of the original data to the storage space plus a copy of the changed data required with COW, ROW writes only the changed data.
With redirect-on-write, the original copy contains the point-in-time snapshot data, and it’s the changed data that ends up residing on the snapshot storage. There’s some complexity when a snapshot is deleted. The deleted snapshot’s data must be copied and made consistent back on the original volume. The complexity goes up exponentially as more snapshots are created, which complicates original data access, snapshot data and original volume data tracking, and snapshot deletion data reconciliation. Serious problems can occur when the original data set (upon which the snapshot is dependent) becomes fragmented.
A clone or split-mirror snapshot creates an identical copy of the data. The clone or split-mirror can be of a storage volume, file system or a logical unit number (LUN). The good thing about clones is that they’re highly available. The bad thing is that because all of the data has to be copied, it can’t be done instantaneously. A clone can be made instantaneously available by splitting a pre-existing synchronous volume mirror into two. However, when a split-mirror is used as a clone, the original volume has lost a synchronized mirror.
A very significant downside to this snapshot methodology is that each snapshot requires as much storage capacity as the original data. This can be expensive, especially if more than one snapshot clone is required to be kept live at any given time. One other downside is the impact to system performance because of the overhead of writing synchronously to the mirror copy.
Copy-on-write with background copy takes the COW instantaneous snapshot data and uses a background process to copy that data from its original location to the snapshot storage location. This creates a clone or mirror of the original data.
Copy-on-write with background copy attempts to take the best aspects of copy-on-write while minimizing its downsides. It’s often described as a hybrid between COW and cloning.
An incremental snapshot tracks changes made to the source data and snapshot data when the snapshot is generated. When an incremental snapshot is generated, the original snapshot data is updated or refreshed. There’s a time stamp on the original snapshot data and on each subsequent incremental snapshot. The time stamp provides the capability to roll back to any point-in-time snapshot. Incremental snapshots allow you to get faster snapshots after the first one, and you use only nominally more storage space than the original data. This enables more frequent snapshots and longer retention of snapshots.
The downside to incremental snapshots is that they’re dependent on the underlying baseline technology used in the first snapshot (copy-on-write, redirect-on-write, clone/split-mirror or copy-on-write with background copy). If cloned, the first snapshot will take a while; if COW, there will be a performance penalty on writes to the original data, etc.
continuous data protection was developed to provide zero data loss recovery point objectives (RPOs) and instantaneous recovery time objectives (RTOs). It’s similar to synchronous data mirroring except that it eliminates the rolling disaster (a problem in the primary data is automatically a problem with the mirrored data long before human intervention can stop it) and protects against human errors, malware, accidental deletions and data corruption.
Continuous data protection is like incremental snapshots on steroids. It captures and copies any changes to the original data whenever they occur and time stamps them. It essentially creates an incremental snapshot for every moment in time, providing very fine-grain recoveries. Some CDP implementations are both time and event based (such as an application upgrade). A good way to think of CDP is as a journal of complete storage snapshots.
Continuous data protection is an excellent form of data protection for email, databases and applications that are based on databases. The ability to roll back to any point-in-time makes recoveries simple and fast. FalconStor’s IPStor is an example of a storage system and/or virtualization appliance that provides CDP.
With more and more data to protect and often less time to do it, snapshots will play a bigger role in data protection and daily storage operations. Although the differences among snapshot technologies may seem subtle, how they operate in your environment could have a significant effect on the level of protection provided and how quickly recoveries can occur.
This article originally appeared in Storage magazine.
About the author:
Marc Staimer is the founder, senior analyst, and CDS of Dragon Slayer Consulting in Beaverton, OR. The consulting practice of 11 years has focused in the areas of strategic planning, product development, and market development. With over 28 years of marketing, sales and business experience in infrastructure, storage, server, software, and virtualization, he’s considered one of the industry’s leading experts. Marc can be reached at [email protected]
Plan ahead to avoid bare-metal restore frustration
Data growth continues to challenge IT professionals
VMware and virtual data backup and recovery technology tutorial
Copy-on-write в PHP
Copy-on-write или копирование при записи — один из способов управлением памятью. Но перед тем как давать какие-то определения, предлагаю рассмотреть пример:
В данном примере есть функция handle. В эту функцию передаётся массив большого размера. По умолчанию в PHP передача аргументов происходит по значению. Это означает, что если изменить значение аргумента внутри функции, то вне функции значение всё равно останется прежним. Другими словами внутри функции используется копия переменной, но для создания копии требуется выделить память.
Вопрос: в целях оптимизации стоит ли передать аргумент по ссылке handle(array &$array)?
На самом деле ответ зависит от того, что происходит внутри функции handle.
Чтение аргументаМодификация аргумента
В данном случае произойдет копирование переменной, то есть создание нового контейнера zval с выделением памяти.
Copy-on-write
Суть подхода сopy-on-write или копирование при изменении заключается в том, что при чтении переменных используется общая копия, в случае изменения переменной — создается новая копия.
Замеры памяти
Проведём тест, попробуем увидеть, что при чтении аргумента используется общая копия, а при модификации выделяется память.
Из результата видно, что при изменении аргумента, была выделена память, а при чтении нет.
Передача объекта в качестве аргумента
Стоит рассмотреть случай передачи объекта в качестве аргумента. И посмотреть, применяется ли для данного случая механизм copy-on-write. Рассмотрим следующий пример:
Складывается впечатление, что объект передаётся по ссылке, а не по значению, но это не совсем верно. На самом деле при передаче объекта в качестве аргумента передаётся только ID объекта. Содержимое объекта хранится отдельно и доступ можно получить по ID. Из-за этого, если что-то изменить внутри объекта, то это доступно и внутри функции и вне функции. Более детально можно прочитать документации PHP: Объекты и ссылки.
То есть в данном примере не нужно передавать объект по ссылке, так как передается всего лишь ID объекта.
Вывод
В подавляющем большинстве не нужно передавать аргумент по ссылке. Так как редко оперируем большими по памяти переменными. И аргументы в большинстве случаев используются только на чтение. Если речь идет про передачу объектов, то и в этом случае не нужно передавать аргумент по ссылке, так как передаётся только идентификатор объекта.
Другими словами не нужно сейчас бежать, и срочно что-то менять в вашей коде и в вашем подходе. Продолжаем писать код как, обычно, но уже более осознанно.
