Electronic archiving is a pretty common process nowadays. We all want our documents to be a click away, even to be accessed wherever internet is available. It is not only about information accessibility, quite a common thing for people today, but it is also about the philosophy behind finding a document through iterative deepening search using a key word, a sort of finding the needle in the haystack without overturning the cart and checking straw by straw.

It may sound like magic, but iterative search is something ordinary for 21st century generation, Google generation, as one can easily find whatever they are looking for just typing the right word in the search engine, a specific word for the thing that is looked for, specific enough for one of the 7 billion people inhabiting Earth (OK, maybe not all of them have access to the internet), to have written about….

This is how the idea of the electronic archive indexation appeared. This translates into capturing some key words among all the words scanned in the documents. Well… and how to extract all the words from scanned documents, which of course, behave as pictures through optical characters recognition (OCR). Actually, it cannot be done like this, as the recognition system is rather time consuming and also not very accurate, even if there is useful software, able to decipher even handwriting. (ICR).

This is how the idea of key words extraction from the documents appeared, as well as their association or linking with the particular scan or picture, so that the search would be done using them. And this led us to creating a records data base, in which each document represents a record and its fields are populated with “features” of the particular document, such as: the issue date of the document or document type: renting agreement, bailment, concession, superficies…

These are the fundamentals of the “Depth” electronic archive, mainly that customized archive, created according to the features its user agreed upon and then manually or semi- automatically extracted from each and every document and then associated to a scan, which represents the picture of that document.

This defining field extraction seems to be an expensive one in the overall electronic archiving process, which is ultimately a process of production subject to commercial, productivity and not the least quality or “accuracy” factors, of the extracted information.

This is the part that brings added value to a business project we have successfully concluded for a Telecom company, as part of a Lease- Management project, where we found links that can make our activities faster and safer.

In this project we had to make an electronic archive from thousands of different contracts the Telecom operator had with land owners the company had their equipment installed on (re-emission of voice and data).

Every contract could have a few additional documents attached, which usually renew the fee owed to the owner at the end of the initial renting period.

The data base was made of a structure of tables related to one another. For example, every contract represents an entry in the contracts table, linked to the addendum tables following the ID-Contract key. There were several types of links, for example a contract could be associated to more beneficiaries of the owed renting sum, due to the fact the land was owned by multiple owners.

Such a project entitles the extraction of a very big number of characteristics for each and every contract, around 60-70 for our project (Code-Contract, Type_Contract, Duration_Contract, etc.). The order of all this information in the contract page was not a fix one, as contracts, even of the same type (renting for example), were not made in the same way, as concluded with different notaries at different moments in time. Automated recognition in pre-defined areas is well known, but impossible to use in such a situation.

The only extraction method used of the values was the manual one. But, in order to be productive, a contract pattern classification process was required, mainly the exact part of the contract where the piece of information can be found. Then, these data are communicated to the operators group and they were monitored for two weeks, being given full support.

The extraction tool, which allows document visualization while accessing data input window and a series of special features:


Last, but not the least, the usage of predefined values tables of people who benefit from the sums owed, which are taken from client’s accountancy department needs to be mentioned. These values are compulsory, thus being eliminated any risk of faulty entry/ record.

Talking about quality and the fact that data in a “Depth” Electronic archive must be 100% accurate, one needs to mention that the use of cross key information extraction is compulsory. These methods consist in: extraction of all the data in the contract, by an operator, the same contract is analyzed through its important fields (usually money, periods, etc..) by another operator. These pairs of data are analyzed automatically (a software routine) and if they are similar, then the record is introduced successfully in the data base. If not, they are corrected according to the original.

One year extracted data conciliation period was required after solution implementation, while thorough checkups on the monthly payments and addendums related to the paid sums, beneficiaries, costs, etc were made.

After this period of time, the “Depth” Electronic archive was declared to be “valid”. Today, this data base, together with the related software represents a very strong land management tool for all the rented land throughout the country.

The implemented Lease Management module benefits also from an up-date section, where new or additional contracts can be added in order to change the payment conditions, the destination or the periods of older contracts.

A “Depth” Electronic archive is a very safe action tool, which can be linked to an integrated accountancy system or can work as an informing system, reporting that can be done thanks to its searching abilities using imbricate fields. Here is an example of imbricate search, or successive elimination: search for a superficies contract, using the TypeContract search key and there you’ll be given 55 of the 7500 contracts, then look for a certain owed sum and 2 records will show up and then, if we look for a certain year the contract was signed or a probable time interval, we will be led to the exact contract we were looking for.

See you soon,

Florian-Sorel Nitu

Project Manager BD/GIS