Cloning in EMC Networker; playing around with mminfo and nsrclone

Jeroen de Meijer
12-05-2016

The environment

We have a tape back-up environment with a tape library in two locations. The software used is EMC Networker (formerly Legato Networker). The back-ups are made on the primary location (Datacenter 1), and all back-ups are cloned to the secondary location (datacenter 2).
There are several back-up groups containing various datasets. As retention times are a setting at group level, the groups are organized based on that retetion time, and the type of systems in them. For instance:

Group                 | Description
--------------------- | ---------------------------------------------
6 Months unix         | unix systems, 6 months retention
7 Years unix          | unix systems, 7 year retention
6 Months windows-prod | windows production systems, 6 month retention
etc.

Back-ups are made to back-up pools (collections of data storage). The back-up pools on the primary location are named after their retention time (i.e. “6 Months” and “7 Years”. The pools in the secondary location have “Clone” appended to their names (i.e. “6 Months Clone” and “7 Years Clone”).
Having seperate pools for different retention periods ensures effective tape usage (otherwise a 7 years retention saveset could hold up a volume which further only contains expired 1 month retention savesets.

The problem

Recently a requirement has been enforced that all tapes have to be stored off-site too. As a quick fix, a lot of tapes in datacenter 2 have been moved off-site (the pools “6 Months Clone” and “7 Years Clone”).
In addition the taperobot in datacenter 2 had to be replaced. This new robot also had new tapedrives with the latest tape technology (T10000) whereas the old robot was LTO4 (and the one in datacenter 1 is LTO5).

To conform to the requirement of off-site storage, two new pools have been created: “6 Months Clone Offsite” and “7 Years Clone Offsite”. In addition a second cloning job has been added to also clone all back-ups to these pools. Thirdly, some scripts have been created to manage the withdrawel and return of the off-site tapes to and from the remote storage.
So this is how it should be:

Location     | Pools
------------ | -----------------------------------------------
Datacenter 1 | "6 Months", "7 Years"
Datacenter 2 | "6 Months Clone", "7 Years Clone"
Offsite      | "6 Months Clone Offsite" "7 Years Clone Offsite"

The problem arises due to the fact that tapes from datacenter 2 (which the new robot can’t read anyway (the one in datacenter 1 can however)), have been put off-site. These tapes are in a pool (“6 Months Clone” and “7 Years Clone”) that is┬ánormally supposed to be on-site in datacenter 2.

At the moment of the change we have the following situation:

  • — Datacenter 1 : fully populated back-up set in the correct pool. This is our source.
  • — Datacenter 2 : No back-up sets prior to the change (these have been moved off-site). Back-ups after the change are in the correct pool.
  • — Off-site : fully populated back-up set of back-ups prior to the change in the wrong pool. Back-ups after the change are in the correct pool.

The issue for the 6 Months retention back-ups will fix itself in 6 Months, which was decided as an acceptable time period. For the 7 Years retention back-ups, this would take too long. So we have to do some additional cloning to fill up the gaps.
For the 7 Years, all back-up sets are off-site; the older one in the wrong pool, but that doesn’t matter, they are recoverable in case of disaster. In datacenter 2 however, the back-up sets prior to the change are missing. These have to be created.
I tried using a default clone job for the correct time period to clone from pool “7 Years” to pool “7 Years Clone” by setting the amount of need clones to three. But that somehow didn’t work without a clear error. So I tried to do it manually.

The solution

We need to create a list of savesets that need to be cloned and clone these savesets manualy. To the media database for the savesets we use the mminfo(1m) command.

First we need to know which groups existed in the past 7 years and decide which we want to have cloned.
We report only the group (-r group), sort by time (-ot) and query for savesets that are not incomple, are in the pool “7 Years” and have 2 copies (on in datacenter 1, and one off-site, and thus missing the datacenter 2 copy).

# mminfo -r group -ot -q '!incomplete,pool=7 Years,copies=2' | sort | uniq
7 Years unix
7 Years windows
7 Years windows-cifs
7 Years windows-exch
7 Years windows-mssql
7 Years windows-prod

The mminfo command takes multiple groups as a comma seperated list, where each group has to be prefixed with “group=”. Let’s put them in a variable called GROUPS.

# GROUP="group=7 Years unix,group=7 Years windows,group=7 Years windows-cifs,group=7 Years windows-exch,group=7 Years windows-mssql,group=7 Years windows-prod"

Next we need to list all the relevant savesets; We’ll need their id’s (called ssid’s). We want only complete ones, sorted by time, from the pool “7 Years”, only two copies and from a group in our GROUP variable.
We’ll put the list of ssid’s in the file called “list”.

# mminfo -r ssid -ot -q "!incomplete,pool=7 Years,${GROUP},copies=2" > list

Doing a test ran, I noticed an issue. As the back-ups go back 7 years, there are bound to be savesets of systems that (partly) do not exist anymore. Some saveset will therefor give errors during cloning, which have to be filtered out. This is probably also the reason that the standard clone job didn’t work. Lesson learned; do not delete clients if not all savesets of the client have been expired.
To filter out the failing savesets, we will do a dry-run (-n) clone off every saveset. The ones that give an error, we will log in the file called failed

# for SSID in `cat lijst`
do
    nsrclone -n -b "7 Years Clone" -d d2bus01-stor531 -S $SSID 2>&1
done | grep "nsrclone: Failed to add" > failed

The ssid is in the 13th field, so we will use awk to extract this. In addition we will filter out double ssid’s (not really neccessary, but for neatness.

# cat failed | awk '{ print $13 }' | uniq > uniqfail

We will also filter out the double ssid’s from our original list.

# cat lijst | uniq > uniqlijst

We now want a list of all ssid’s without the failed ones. We do this by diffing both files and selecting the ones that only exist in uniqlist.

# diff uniqlijst uniqfail | grep \< | awk '{ print $2 }' > uniqok

This list can now be used to clone every ssid. Note that the list is timesorted, so that the savesets that will expire at (roughly) the same time will likely be on the same tape.

For neatness we will log to clone.log using the tee(1m) command.

# for SSID in `cat uniqok`
    do nsrclone -b "7 Years Clone" -d datacenter2-robot -S $SSID
done | tee clone.log &

Drawback of this loop is that it will clone the savesets one by one. So this will take some time depending on the amount of save sets.

To do them in sequence, you could also do the following:

# cat uniqok | nsrclone -b "7 Years Clone" -d datacenter2-robot -S - | tee clone.log &

However, in this case I was not sure the correct sequence of savesets in time would be preserved. As we were in no hurry, I found the loop acceptable.

I’ve been hacking around with EMC Networker (Legato Network, Solstice Backup) for about 15 years now, so if you have any questions, do not hesitate to ask.

Regards,

Jeroen de Meijer

LEAVE A REPLY

1 Comment

  • Carl says:

    This is a great article. Much easier to understand than the offical support documents. Many thanks.

you might also like