Task Design¶
Backup and restore entire repository solution tasks¶
This solution implement a full backup strategy, will backup entire repository include gold, file, license directory. Suggest user if you have enough disk space, we recommend user to use this backup solution.
Implement and test this solution¶
Implement backup, create a backup steps shell script, and add this script to crontab, setup backup time point in crontab.
Backup steps:- Stop the Scheduler before backup(and verify that all the agents have stopped)
- Backup postgresql database
- pg_dumpall | gzip > backup_filename
- use rsync implement incremental backup backup_filename
- Backup entire repository data to a backup server using rsync, include gold, files, license directory, use incremental backup
- Restart the Scheduler
Solution1: no-rotating backup
On backup server, current directory store the latest backup data (use rsync backup), and day(1,2,3,…) directory store a hard link copy of whole backup data after each days backup.
Solution2: Rotating backup
First backup full data at base1 and do incremental backup at day1,day2,day3…, then do full backup at base2 cover base1, follow this cycle to do backup.
My suggestion: we can use solution1.
Note: User make sure backup all the configure file of fossology in their system backup
Restore steps:- Make sure fossology system and configure files restore correctly
- Restore postgresql database
- Restore backup_filename from backup server
- Restore backup_filename with psql < backup_filename
- Restore entire repository file
- Start scheduler
Test this solution on test system to verify all backup and restore step run correctly.
Create a instruction about this solution¶
Create a backup and restore instruction about backup and restore entire repository solution.
Delivery: Instruction
Backup and restore only Gold files solution¶
This solution only backup repository gold (and license) files, if user don’t have enough disk space or they don’t want to backup entire repo, they can use this solution. This solution need code changes.
Implement and test code changes to implement this solution¶
- Unpack agent code changes*
Requires a switch so that it can unpack to a repository but it not updates the db.
UI code changes
As we only backup gold files, at restore time not all files will be reunpacked, so when browse the files which the files not in repository, will give user an interface to reunpack.
Any place a file is retrieved from the repository needs to check to make sure the file exists. If it does not, check to make sure the gold file exists. If it does, possibly ask the user if they want to recover from the gold, and if so, queue up a job to do the ununpack.
Create a mockup base on bobg’s mockups:
Automate do ununpack to recover running jobs at backup point
If our only gold files backup solution take place at the time point that license analysis job(or other agents need the unpack files)in process, when restore need to automated queue up a job to do the reunpack.
- Query the jobqueue, find the job not finished in the backup point
- Get the gold files from this job
- Automate reunpack these gold files
Implement and test this solution¶
Implement backup, create a backup steps shell script, and add this script to crontab, setup backup time point in crontab.
Backup steps:- Stop the Scheduler before backup(and verify that all the agents have stopped)
- Backup postgresql database
- pg_dumpall | gzip > backup_filename
- use rsync implement incremental backup backup_filename
- Backup only gold files repository data to a backup server using rsync, include gold, license directory, use incremental backup
- Restart the Scheduler
Note: User make sure backup all the configure file of fossology in their system backup
- Make sure fossology system and configure files restore correctly
- Restore postgresql database
- Restore backup_filename from backup server
- Restore backup_filename with psql < backup_filename
- Restore only gold repository files
- Reunpack needed files
- Start scheduler
Test this solution on test system to verify all backup and restore step run correctly.
Create a instruction about this solution¶
Create a backup and restore instruction about backup and restore only gold files repository solution.
Note: Add test time cost about unpack job to instructions
Test case was the Fedora-11-source-DVD.iso.
The iso is 4.2 GB
It unpacks into 54 GB (~13x)
It took 5.5 hrs to unpack
Code changes to reduce the size of the repository¶
There are three types of files that unpack saves:- leaf files --- The leaf files are simply files that can be unpacked no further. For example, file.c, myfile.spec, myfile.png, ...
- containers --- Containers are archives and "artifacts". For example, containers are files like file.gz, file.rpm, file.jar, file.war, file.ott, ... Since these are unpacked to leaf nodes, we could delete the containers themselves.
- artifacts --- Artifacts are files and directories created as a result of the unarchiving process. They have two names:
- artifact.meta
- artifact.unpacked
- directories
Implement the proposed only backup and restore gold files strategy in the two running FOSSology production systems¶
Implement the proposed only backup and restore gold files strategy in the two running FOSSology production systems (external and internal systems)
Test disaster recovery¶
Perform a disaster recovery test on one or both production installations to validate the safety of the production systems. (Need more describe the scenario steps)
- Disaster recovery: The server is in the midst of a job and the plug is pulled; can we recover?
- Flat out loose everything - the whole data center blows up, no more disks, no more nothing
- blow away just the database
- loose an agent and its storage.