HomeInternational Journal on Social Innovation & Researchvol. 6 no. 1 (2013)

Development of a File Duplicate Detector System Using Hashing Algorithm

Danny G. Umoso

Discipline: Information Technology



The problem on duplication of files in an external storage has been increasing. The inability of built-in software to detect such duplication results to a massive loss of storage space. Files can be the same in terms of content and structure even if they have different filenames. The use of different techniques and algorithms leads to the identification of similar files. Elimination of duplicate files results to freeing space and optimal use of file storage. Hashing algorithm is one of the possible algorithms that can be used to detect file duplication. The capability of the hashing technique to identify the structure (file hash) of data files paves the way to validate whether files are duplicates or not. Such technique is a bit crude because of its simplicity; however, it is effective for academic use. This research further supports those discussions on hashing algorithm capabilities in determining duplicate files.