Rdfind.php

From ProgClub
Revision as of 13:33, 1 December 2014 by John (talk | contribs) (Added Motivation section...)
Jump to: navigation, search

rdfind.php is the ProgClub redundant data processing software. That's the software that replaces duplicate files with hard links to save your disk space. It's a reimplementation of rdfind with support for a maximum number of hard links per file. For other projects see projects.

Status

Version 0.1 released.

Motivation

Why make this software? Good question! I was using the original rdfind program and I lost a bunch of files because I exceeded the maximum number of hard links per file and the rdfind program choked on that and I lost data. Also, my program creates hard links per UID/GID/mode so you don't lose permissions and ownership information when new links are created. Also, the original rdfind program makes multiple passes to check first bytes, last bytes, etc., and I don't bother with that. I just use the hashing function and process files once. So my program is O(n).

Administration

Contributors

Members who have contributed to this project. Newest on top.

All contributors have agreed to the terms of the Contributor License Agreement. This excludes any upstream contributors who tend to have different administrative frameworks.

Copyright

Copyright 2014, Contributors.

License

Licensed under the GPL license.

Resources

Downloads

There are no downloads for this software, get your copy from subversion.

Source code

The repository can be browsed online:

https://www.progclub.org/pcrepo/rdfind.php/branches/0.1

The latest stable released version of the code is available from:

https://www.progclub.org/svn/pcrepo/rdfind.php/tags/latest/0.1

Or if you want the latest version for development purposes:

https://www.progclub.org/svn/pcrepo/rdfind.php/branches/0.1

Links

  • See rdfind for the software that inspired our project.

Specifications

Functional specification

The functional specification describes what the project does.

This software processes a number of input directories and looks for descendant files that are duplicates of each other. The software replaces duplicate files with hard links, reclaiming disk space.

The software determines that files are duplicates by way of a hashing algorithm. A number of algorithms are available with a minimum bit-length of 128 bits (16 bytes). The default algorithm is sha256 which should be relatively safe. If you use a weaker hashing algorithm be sure your inputs are safe.

Technical specification

The technical specification describes how the project works.

The PHP software is split into two parts: a library (bin/rdfind.inc.php) and an executable (bin/rdfind.php). The executable just calls the library passing in command-line arguments. This separation allows you to include the library and call the rdfind_php function from your own scripts.

The software enumerates files below the input directories and looks for duplicates with the same UID, GID and MODE. When duplicates are discovered hard links are made. If a file reaches the maximum number of hard links it is replaced and matching starts over again at 1 hard link for the following files.

Notes

There are more notes in the README file.

Notes for implementers

If you are interested in incorporating this software into your project, here's what you need to know:

Include the bin/rdfind.inc.php file, e.g.:

require_once '/path/to/rdfind-php/bin/rdfind.inc.php';

Then call the rdfind_php function with your paramters.

Notes for developers

If you're looking to set up a development environment for this project here's what you need to know:

Check out the latest development branches with:

svn co https://www.progclub.org/svn/pcrepo/rdfind.php/branches/ rdfind-php

Then look in your rdfind-php directory for the major.minor version you're interested in (at time of writing only v0.1). The bulk of the code is in the library file bin/rdfind.inc.php.

Tasks

TODO

Things to do, in rough order of priority:

N/A -- can't think of anything more to add at this point! (ideas welcome).

Done

Stuff that's done. Latest stuff on top.

  • JE 2014-11-30: released version 0.1.16.
  • JE 2014-11-30: created project page.