Frequently Asked Questions

Basic Concepts (6)

Data entry is the act of transcribing some form of information into another medium, usually through input into a computer program. Forms of data that people might transcribe include handwritten documents, information off spreadsheets, and sequences of numbers, as well as computer code and even names and addresses.

Data entry speed is the measurement of speed with which data is electronically recorded by manual input. Rather than WPM (words per minute) as regular typing is measured, data entry speed is recorded in KPH (keystrokes per hour). High data entry speed without a high accuracy percentage rate does not qualify an individual to work as a data entry operator.

Numeric data entry speeds that qualify clerical workers for employment typically must be between 9,000 and 12,000 KPH. Acceptable alphanumeric data entry speed requirements are a little bit lower than purely numerical speeds, because the addition of the alpha characters slows down most operators. These can be as low as 7,000 KPH to qualify for employment.

A keystroke is a count of every time a button is pressed on a keyboard. Many word-processing programs such as Microsoft Word and Apple Pages will count the characters and keystrokes used to generate a document.

Savvy business decision makers understand that it is essential to find methods to free up employees’ time to accomplish only tasks that are central to business functioning. They achieve this by outsourcing the tasks and services that are not as mission critical, such as data entry services to professional service providers like Keyoung.


Other terms for data processing include: remote processing, data entry, data conversion, tele-working, freelancing, temporary services, offshore outsourcing, form processing, imaging, digital conversion, data processing, transcription services, document archiving, document management, scanning, OCR, digital conversion, offshore data conversion and offshore computer services.


To be able to let the OCR/ICR software to recognize your forms, you need to redesign your forms and may need to purchase special equipment as well. This is justified if this process can guarantee you 100% accuracy.

However, this is normally not the case. You will still need your internal staff to do the “clean-up” job to verify those recognized and correct those not accurately recognized which are very labor intensive and time consuming.

It will be definitely more costly than data entry with punch and verify at the first time by using Keyoung’s service. If the documents are clear enough for OCR, we can use OCRed text as the "punched" result and ask our typists to type one more time as the verification process for which it is cheaper than Punch and Verify (Double Entry) by typists.


4.1 Introduction

The basic principle of "Double Entry System (DES)" is to provide a mechanism to increase the accuracy of the deliverables during the entire data processing cycle by sacrificing time and cost in providing a second typing process.

Double Entry or "Punch and Verify", is a technique we employ to achieve guaranteed accuracy rates of 99.92% or greater for printed matter documents. The same data is punched and verified by two different operators. Where there is any mismatch between the two operators, the system will stop and let the second operator examine the original document again and then make appropriate corrections. This scientific approach will dramatically increase accuracy and virtually eliminate the drawbacks from manual proofreading. Finally, a data quality auditor will conduct a random 5% quality check of the data capture results.

The following will illustrate this system by examples.

We have the following parties: D - the document to be typed M - the maker (the first typing person) C - the checker (the second typing person who serves also the verification person)

Accuracy is measured by referring to D. Anything typed different with D will be regarded as an error.

In the real life, M may make mistakes and C may also introduce mistakes. We do not trust M and we do not trust C as well. The beautifulness of the DES is to ensure that we trust the result from C by letting C following the proper workflow. In this case, we need not have sophisticated management involvement to judge M and C's discipline.

The design is to let C use M's result plus referring to D to correct both M's and C's mistakes if any. This is the most cost effective way to ensure the accuracy without introducing a third party.

Another more expensive approach is to introduce a third party called the Judge (J) to check the difference that M and C made and then correct the mistakes made by either M or C by referring to D. DES is proven as a feasible working model for providing accurate data entry service.

4.2 The Mechanism

Let's illustrate by using the following example:

Now M and C are going to type D below:

D = 28/F Data Processing Center

Case 1: M is incorrect and C is all correct:

M = 28/F Date Processing Centre

C = 28/F Data Processing Center

C knows the following facts by looking at M's result:

(1) There are 3 places that M and C are different (2) The other characters are correct. This is very important! Without seeing M's result, C will never know which are correct and which are the suspected ones.

C will then focus on the above 3 places ONLY by referring to D and correct M's result. This is also very important! If C needs not correct M's error, we will never know if C has actually referred to D!

You can say if C does not refer to D even if he knows that there is discrepancy between M and C, then it should be C's discipline problem and can be solved by using management techniques.But without letting C to correct M's incorrect data, you do not have any proof to say that C does not refer to D. C may or may not refer to D. This design is to let us trust the system instead of the people. C needs actually did something and we trust the fact that C has referred to D. That "something" is trying to correct M's incorrect data.

C can proceed to next field if M's typing equal to C's typing. You will say that how about if C copies M's result and then proceed. Yes, this is the discipline problem and the management will take appropriate action. Actually in our system, we will keep M's original typing result and C's modification on M's result is just a copy of it. In this case, it is easy to find if C is doing the "copy" trick. When C knows that his copy trick will be easily detected if the management wants to do so, C will not intend to do so.

Case 2: M is all correct and C is incorrect:

M = 28/F Data Processing Center

C = 28/F Date Processing Centre

C knows the following facts by looking at M's result: (1) There are 3 places that M and C are different (2) The other characters are correct. This is very important! Without seeing M's result, C will never know which are correct and which are the suspected ones.

C will then focus on the above 3 places by referring to D and correct C's result. When both match, the system will let C proceed to the next field.

Case 3: M is incorrect and C is incorrect too:

M = 27/F Data Processing Center

C = 28/F Date Processing Centre

C knows the following by looking at M's result: (1) There are 4 places that M and C are different (2) The other characters are correct. This is very important! Without seeing M's result, C will never know which are correct and which are the suspected ones.

C will then refer to D and correct both M's errors and C's errors. The system will let C proceed if both M's result and C's result are equal.

The only error that DES cannot detect is when M and C are making the error at the same place but this is acceptable as this chance is rare and is within the acceptable error rate range.

Case 4: M is all correct and C is all correct too

In this case, the system will just let C proceed to next field without showing any of M's result.

4.3 Summary

The following points are very important in the entire DES:

(1) The system must highlight the difference between M and C's results so that C can concentrate on the difference and make appropriate corrections. This is to save C's time and improve the efficiency.

(2) The system should not let C proceed if both results do not match. This is to ensure that C will REALLY look at D again and make proper corrections no matter who made the mistakes.

(3) C is responsible for correcting both M's and C's errors. C is using M's result to prove himself right for the matching characters and correct those unmatching characters.

(4) The system shows M's result only when there is any difference with C's. The difference must be highlighted properly. C should only focus on the difference as the others are already correct and needs no special handling.

4.4 Keyoung Double Entry System Workflow


The method introduced above with accuracy level at 99.92% can satisfy most of the clients. With "Double Entry Single Compare", the accuracy rate can be further raised to 99.95% for clear printed matter documents.

Two data entry operators (A & B) key in the data and generate the output files: File A & File B.

A QC operator (C) compares these two files through the compare software. When it shows any discrepancy between the two files, C duly corrects the mistakes and generates File C.

For printed matter source documents, an accuracy level of 99.95% can be achieved through this process.


This method is used for more accuracy demanding clients. Two data entry operators (A & B) key in the data and generate the output files: File A & File B.

A QC operator (C) compares these two files through the compare software. When it shows any discrepancy between the two files, C duly corrects the mistakes and generates File C.

Another QC operator (D) compares these two files through the compare software. When it shows any discrepancy between the two files, D duly corrects the mistakes and generates File D

The final QC operator (E) compares File C and File D. When it shows any discrepancy between the two files, E duly corrects the mistakes and generates the final File E.

For printed matter source documents, an accuracy level of 99.995% can be achieved through this process.


Confidentiality (2)

All our processes are designed to ensure foolproof security and confidentiality through stringent privacy policy implementations that conforms to international standards.

We also have our employees sign a non disclosure agreement at the beginning of any engagement which is based on client specifications.

We are also ready to adopt any additional measures suggested by our clients.


We are ready to sign NDA and Confidentiality agreement in this regard.


Infrastructure (1)

We are proud to have state-of-the-art infrastructure that cater to your project requirements.

The main features are:

  • More than 1,000 workstations
  • Broadband Internet access
  • Uninterrupted power supply (UPS)
  • Linux and Netware Servers with Raid 1 (mirror) disks
  • Back-up Servers
  • Back-up Generator
  • Kodak high speed Scanners
  • Laser Jet and dot matrix printers, etc.

We have the capacity to invest further, on short notice, in additional hardware and software, to meet our clients needs.


Operational Issues (7)

There are basically four ways:

1. E-mail

For example you may have a text that you want assembled into a handbook. This can easily be e-mailed to us. We will do the work and return it in an electronic format of your choice (e.g. as a Word document or a PDF file). This is useful for small jobs.

2. FTP

We can download the image files from your FTP server. If you do not have a FTP server, we can open a secured FREE FTP account for you at our server and you can then upload the image files and download our deliverables. Our FTP server supports "Explicit FTP over TLS", all the transfer of data will be encrypted. This is applicable for large size jobs.

3. Normal mail service

You can send your material in the mail. This method would be useful if you wanted to send handwritten material for data entry. This is applicable for non-urgent jobs.

4. FedEx or DHL

We can receive packages in about 2-3 working days. This is applicable for some jobs that are not suitable for using FTP.

We can establish a secured FTP account FREE for you to upload your source files and download the completed data files. Normally, all the data files will be encrypted by using PGP to ensure maximum security.

The final output can be dispatched as per the client's preferred mode of delivery as e-mail attachment, CD or directly uploaded through FTP server. All the data files can be encrypted before dispatching as per client's requirements.

We need samples of the source data and approximate number of records to determine turn-around time for a particular project.

Depending on the project nature, it can be as frequently as daily, weekly, monthly or just once at project close. Normally, in case of high volume requirements, we may require an initial project set up time from few days to two weeks depending on the project nature and scope.

Yes, we use dual-monitor approach to handle source documents in image format.

Two monitors are attached to the same workstation while one is used for displaying source image and another is used for displaying the data input interface.

This can not only save printing cost but also increase source document security level.

Yes, we conduct sample/test work for the clients free of charge for large projects.

This will enable you to have a better idea about our quality of work and at the same time enable us to understand the project nature and scope better and eventually we can give you our best quote.

Business Office Hours: Monday through Friday, 9:00am to 6:00pm. Operating Hours: Up to 7 days each day with 2-3 shifts as per the client's request.

Outsourcing (8)

We strongly believe in providing a service that allows clients to focus on the results.


Outsourcing of data entry services is more cost effective and done more accurately.

Working with a professional data capture service bureau you are also investing in their experience. Because that is their core competency and greater turnaround, quality and pricing will be achieved.

Outsourcing also provides the advantages of avoiding expensive capital costs, additional employees, training, drain on existing administrative resources and space requirements.


We have been continuously investing in innovative technologies and experienced manpower to provide you the advantage of a single point vendor that ensure

  • Competitive pricing
  • Superior quality
  • High data security
  • Enhanced risk management
  • Pragmatic and flexible approach
  • Most efficient and effective solutions
  • Option to outsource one or all of your high-end knowledge requirements
  • Quick turnaround processing time
  • Flexibility to work with any client system
  • Multidimensional analysis and support
  • Handling a wide variety of customized processes
  • Highly skilled Professionals

Imagine how much time you would save if you could quickly and easily find documents or images and share them within your company-anywhere. Our area of expertise lies in converting your paper documents, forms into digital formats. Once documents are available digitally, networking of information and document retrieval (through the Intranet or Internet) can help make your company more efficient.


The following is a quote from Bill Gates from an April 1999 issue of Times Magazine:

"As a business manager, you need to take a hard look at your core competencies. Revisit the areas of your company that aren't directly involved in those competencies, and consider whether Web technologies can enable you to spin off those tasks. Let another company take over the management responsibilities for that work, and use modern communication technology to work closely with the people - now partners instead of employees are doing the work. In the Web work style, employees can push the freedom the Web provides to its limits."

This next quote is from Inc. magazine January 1999.

In their book Unleashing the Killer App: Digital Strategies for Market Dominance (Harvard Business Press), Mui and co-author Larry Downes urge companies to perform internally only those activities that can't be performed more cheaply in the open marketplace. When the marketplace in question is the Internet, where vendors with only "dot com" over their head can operate at very low costs, the savings from pushing out business tasks is considerable.


The cost of living in China is much lower than in most Western countries. Salaries are about 20%-30% of what they are in North America and Europe. We are able to do your work at a much lower cost and pass the savings to our clients to establish a win-win situation.


Our reputation for professionalism and service excellence has grown over the years. We have reached the position of a global player by constantly upholding the highest standards of business ethics with our commitment to quality. With the help of these factors we create relationships that transcend time and space.

Reasons for Outsourcing

Reduce Capital Investment

Outsourcing is a way to reduce the need to invest capital funds in non-core business functions. Instead of acquiring resources through capital expenditures, you can contract out on an "as needed" expense basis.

Core vs Non-Core Business Activities

Increase Productivity & Save Management's Time

Outsourcing also increases productivity because more work is processed in less time, thereby allowing you to focus on your core business. From time to time, most businesses need quick and accurate data entry services to handle those jobs that are just too big or too timely for your in-house data entry services. There is no need to work your staff overtime or alienate staff by having them doing menial typing job.

Reasons for Outsourcing

Operational details are handled by Keyoung's professional staff. Management time need no longer be consumed in controlling an activity for which they have not been experienced or trained. We provide a single point-of-contact responsible for proficient execution of the tasks.

High Quality and Cost Saving

Keyoung mixes and matches traditional data processing services, flexible staffing levels, and advanced state-of-the-art internet technologies that will bring to you a solution that will get the job done faster, with less hassles, and easily within budget.

High quality and low cost always have been the dual attraction of offshore data capture outsourcing. According to industry analysts, offshore customers save an average of 25 percent on projects sent overseas and also believe that this proposition will continue to increase. While Keyoung understands that the cost savings of outsourcing data capture services can be significant, we also recognize and address the responsibility of securing your assets, and offer a proven commitment to quality and accuracy.

Add Flexibility

Keyoung can quickly respond to peak loads by adding equipments and personnel. There is no need for you to advertise, interview or train operators. There is no need for you to hire and train programmers. There is no additional hardware or software required for you. You can save on facility, rent, utilities, and other related costs. Eventually there is more time for your staff to do what they do the best.

Benefits from Outsourcing


Price Issues (3)

Keyoung charges clients in several different ways depending on the type of project: by record, by page, per 10,000 output characters or a combination of them per the client's preference.


We consider each client's business application needs and requirements as unique. At the same time the prices we offer for availing our services are unbelievably low. However, rates may vary depending on the specific business needs, volume of work, frequency of orders, clarity and details provided in specifications, and the time required for completing the project. We can assure you one thing - no one will be able to give the quality work that we offer at a low rate we charge.


We take a cost-based approach to develop a fee for data entry services. The basic steps in our typical production process include the following:

  • Document Intake
  • Data Entry Turnaround
  • Document Scanning
  • Programming
  • Data Entry - Key From Image

What follows is a discussion of the factors affecting the cost of each of these steps.

Document Intake

Document Intake refers to the process of receiving the source documents and preparing them for data processing. How this is done and the amount of document preparation we are required to perform can greatly affect the cost.

The client can scan the documents into PDF and then upload onto the server for us to download for processing. If the client cannot scan the documents by themselves, we can arrange our local contact points to scan the documents for them either onsite or offsite.

Data Entry Turnaround Time

The general rule is the faster the required turnaround time, the higher the fee.

This is to compensate for having to work staff longer and harder during short periods of time, as well as the added administrative burden to quickly get a job done.

Document Scanning

In most cases, clients send us hard copy forms. When we are using our own system to capture data from the forms, we have found the most efficient method is to key from scanned images.

The factors affecting scanning cost include:

  • Document size

This affects throughput. Also, too small or too big cannot be scanned.

  • Document uniformity

Are they all the same size? Are there differences in shading?

  • Paper type

Is the paper real thin, have a sticker attached?

  • Document condition

Poor quality documents will require extra time.

  • Document arrangement

To be scanned efficiently, all documents must be right side up and facing the same direction. There can be no staples, paper clips, etc., and the documents must all be flat (unfolded). The cost of document preparation usually exceeds the actual cost of scanning.

  • Batch integrity

Can we scan continuously, or do we have to retain client batch integrity?


Once documents have been scanned, the images are imported into our data entry system. For each project, a "template" is developed within Keyoung's data entry system, programmed for the specific fields on the form.

Each field can be programmed to restrict data entry to acceptable values. Any business rules a client defines for specific fields or forms can be programmed into each template.

The factors affecting the cost of programming include the following:

  • Document complexity

Size, number of pages, number of fields per page, number of records per page

  • Document uniformity

Is the source document in single version or multiple versions?

  • Edits required

For example, integrating client-provided lookup tables, address correction, etc.

  • Format of output file required

Programming is a one-time cost (unless the source document changes) and it will normally be waived for large or ongoing projects.

Data Entry - Key From Image

Using the data entry template developed for a specific project, our data entry operators key the required data from scanned image.

All work is performed on Keyoung's production servers. Data entry operator workstations connected to Keyoung's production server function merely as "dumb terminals".

All data and images remain on the server. We use dual-monitor approach to handle source documents in image format. Two monitors are attached to the same workstation while one is used for displaying source image and another is used to display the data input interface. This can not only save printing cost but also increase source document security level.

The factors affecting the amount of time it will take to process a document, and hence the cost includes the following:

    • Number of fields to be keyed
    • Number of characters in each field
    • Type of data: Numeric, alpha, alphanumeric
  • Form Design:This is often overlooked but can have a huge impact on data entry efficiency and cost:

"Constrained" forms improve legibility and potentially allow for successful hand-print recognition (a constrained form requires that each character be printed in a separate box); Uniformity of document layout: are the fields to be keyed always in the same place? Answers should be standardized using number or alpha-numeric values and displayed on the form.

  • Legibility: handwritten vs. typed; also, using a "constrained" form with each handwritten character printed in a separate box improves legibility; also, the quality of the source document may affect the quality of the scan and hence legibility.
  • Required accuracy

Different methods are deployed for different accuracies.

The cost for the data entry is the hardest one to estimate, and most vendors will be reluctant to provide a firm fee quote without seeing an actual source document as the above factors will greatly affect the amount of time.

In addition, issues affecting the document (layout, legibility) and the type of data (numeric vs. alphanumeric) will affect the keying rate (can range from and average of 4,000 keystrokes per hour (KSPH) to 10,000+ KSPH).

One other factor will affect fee, and that is the size of the project. For large, ongoing projects, we can provide more aggressive pricing than we can for small, 

one-time projects.

Because every project is unique, and each client has different goals and objectives, it is not practical to take a 'cookie-cutter' approach to data entry pricing.

The only good measure is to let us take a look at your project. That way we can give you a reasonable fee estimate based on the factors discussed above.


Problem Solving (1)

Problems can and often do occur when the production is under way.

The most frequent problem is that some of the source documents cannot be read accurately or do not conform to the data entry/indexing rules.

We send a regular progress report to client's representatives. A part of the report contains the list of problems.

The client's representative is responsible for getting the answers quickly, making decisions based on prior work with the client or getting in touch with the customer to discuss how the problem can be best resolved.

The goal is to answer questions quickly to keep the project on schedule.

Sometimes the source documents contain images of a poor quality. We help the client to identify the problems by providing a list of illegible images or the illegible parts of an image.

The client is then able to examine the problems and supply the correct data. Another strategy is to enter a code for the illegible data.

The customer can then search the database with the illegible codes and make the necessary corrections.

Problem Solving Process


Quality Of Work (1)

We adhere to stringent Quality Assurance procedures to ensure quality of all our works.

Some Quality Checking Techniques:

  • Image Quality Review
  • Batch File Quality Review
  • Use of Validation Tables
  • Double Entry
  • Eye-ball Checking
  • Quality Control Reports

We perform both visual and automated computer checks on all images scanned, and data entered.

All the data will be punched and verified by using our specially developed "Double Entry System" , "Double Entry Single Compare System" and "Double Entry Double Compare Systems" to ensure maximum accuracy. All members of the project team involved in the work are highly qualified and experts in their area of specialization. Adequate training is given before they start working on "live" projects.


Security (1)

Physical Security

Our production centres are housed in modern brick buildings. The offices are equipped with 24-hour CCTV (closed-circuit television) systems, fire alarms, emergency lighting and fireproof carpets. Access can only be gained through a locked, coded entry door or through a locked, front reception door. Visitors to the building are required to identify themselves, state the purpose of their visit and sign a visitors' book. Visitors are escorted by an appointed company representative during their time in the facility to prevent accidental or unauthorized disclosure of customer information.

Document Security

Documents are received in a pre-designated area, logged in and tracked under our accountability control system. No documents are left unattended in the processing area. At the end of the work day, all in-process work, including data on all electronic media, is accounted for and returned for processing the following day.

Data Security

Data entry process is performed from images over a private network which has no connection to the Internet. Firewalls are in place for all the servers. We use a thin client-server approach which means all work done is saved in the central servers instead of individual workstations. Images, data, and all processing related intermediate files stay on the central servers.

Data entry operators only have "input" right but they do not have "output" right. The data entry operators can only input data into the central servers but they cannot download, copy, move, delete or print any file from the central server. All the workstations are diskless (no floppy drive, no CD-ROM, no DVD-ROM). The USB ports of all workstations are disabled as well. No printer is attached to any workstation. The data entry operators are not allowed to bring laptops, PDAs or any other electronic device into our offices. Wi-Fi access in the office is disabled.

We also have following data security measures implemented:

(1) Establish strong password for all the servers and workstations

(2) Put up strong firewalls for all the servers

(3) Install Anti-virus protection

(4) Update systems and programs regularly

(5) Automatic Backup

(7) Regular staff education training on security, privacy and confidentiality.(6) Monitor regularly by system administrator


Underpinning all digital security systems is encryption. Encryption is the technology that hides documents from those who are not authorized, and verifies that the content the originator created is unchanged.

Data from clients to us and between our data centers are secured by using encrypted emails or secured FTP. All files are encrypted first using PGP before uploading to the secured FTP servers to provide a double security measurement.

Data Retention & Sanitization

 By default, the data (source images, intermediate files and output files) will be kept in our production centres for one month. After one month, these files willbe securely removed by using DoD 5220.22-M data sanitization method which will prevent all software based file recovery methods from lifting information from the drive and should also prevent most if not all hardware based recovery methods.

A data sanitization method is the specific way in which a data destruction program overwrites the data on a hard drive or other storage device. The DoD 5220.22-M sanitization method was originally defined by the US National Industrial Security Program (NISP) in the National Industrial Security Program Operating Manual (NISPOM) and is one of the most common sanitization methods used in data destruction software


Prior to employment, applicants are required to read and sign an "Employee Non-Disclosure Agreement" for the protection of the customer's documents. Training in security and confidentiality is given both quarterly and with each new project. We maintain the most stringent standards of data and document security.




Staff And Training (4)

We believe that training is the key element in the process of establishing world class standards.

All data entry operators will receive a minimum of one month training in data entry accuracy and professional software operation skills before they start to work on the real projects.

Client specific training involves process and program specific training and user requirements



Keyoung posses a highly selective hiring policy at all levels, which ensures consistency in quality of human resources.

In case of simple projects, it can be done via NetMeeting, e-mail, chat and/or phone. For more complicated projects, we can come to your site for getting trained or you may send one of your personnel for training the staff.

Unfortunately for data security reasons we DO NOT employ homeworkers. We DO NOT have any type of jobs for home-based workers. All of our operators must work full-time at our production centres.

Glossary (17)

An archive refers to a collection of records, and also refers to the location in which these records are kept. Archives are made up of records which have been created during the course of an individual or organization's life. In general an archive consists of records which have been selected for permanent or long-term preservation.

Records, which may be in any media, are normally unpublished, unlike books and other publications. Archives may also be generated by large organizations such as corporations and governments. The highest level of organization of records in an archive is known as the fonds. Archives are distinct from libraries insofar as archives hold records which are unique. Archives can be described as holding information "by-products" of activities, while libraries hold specifically authored information products.

The archives will not only occupy valuable office space but also increase the time and efforts for document retrieval. Archives processing is the process to scan the archives into digital images for easy retrieval. The use of document imaging will facilitate the companies to file its documents systematically in electronic form. By doing so, it saves a lot of time in searching, retrieving and sorting documents. It will substantially improves the efficiency of operational activities.


Data conversion is the conversion of one form of computer data to another - the changing of bits from being in one format to a different one, usually for the purpose of application interoperability or the capability of using new features. At the simplest level, data conversion can be exemplified by conversion of a text file from one character encoding to another. More complex conversions are those of office file formats, and conversions of image and audio file formats.

Information basics

Before any data conversion is carried out, the user or application programmer should keep a few basics of computing and information theory in mind. These include: * Information can easily be discarded using the computer, but adding information takes effort. * The computer can be used to add information only in a rule-based fashion; most additions of information that users want can be done only with human judgement. * Upsampling the data or converting to a more feature-rich format does not add information; it merely makes room for that addition, which usually a human must do. For example, a truecolor image can easily be converted to grayscale or black and white, while the opposite conversion is a painstaking process. Converting a Unix text file to a Microsoft (DOS/Windows) text file involves adding information, namely a CR (hexadecimal 0D) byte before each LF (0A) byte, but that addition is easily done with a computer, since it is rule-based; whereas the addition of color information to a grayscale image cannot be done programmatically, since only a human knows which colors are needed for each section of the picture - there are no rules that can be used to automate that process.

Pivotal conversion

Data conversion can be directly from one format to another, but many applications that convert between multiple formats use a pivotal encoding by way of which any source format is converted to its target.

Office applications, when employed to convert between office file formats, use their internal, default file format as a pivot. For example, a word processor may convert an RTF file to a WordPerfect file by converting the RTF to OpenDocument and then that to WordPerfect format.

Lossy and inexact data conversion

For any conversion to be carried out without loss of information, the target format must support the same features and data constructs present in the source file.Conversion of a word processing document to a plain text file necessarily involves loss of information, because plain text format does not support word processing constructs such as marking a word as boldface.

Data conversion can also suffer from inexactitude, the result of converting between formats that are conceptually different. As an example, converting from PDF to an editable word processor format is a tough chore, because PDF records the textual information like engraving on stone, with each character given a fixed position and line breaks hard-coded, whereas word processor formats accommodate text reflow. PDF does not know of a word space character - the space between two letters and the space between two words differ only in quantity. Therefore, a title with ample letter-spacing for effect will usually end up with spaces in the word processor file, for example INTRODUCTION with spacing of 1 pt as I N T R O D U C T I O N on the word processor.

Open vs. secret specifications

Successful data conversion requires thorough knowledge of the workings of both source and target formats. In the case where the specification of a format is unknown, reverse engineering will be needed to carry out conversion. Reverse engineering can achieve close approximation of the original specifications, but errors and missing features can still result. The binary format of Microsoft Office documents (DOC, XLS, PPT and the rest) is undocumented, and anyone who seeks interoperability with those formats needs to reverse-engineer them. Such efforts have so far been fairly successful, so that most Microsoft Word files open without any ill-effect in the competing Writer, but the few that don't, usually very complex ones, utilizing more obscure features of the DOC file format, serve to show the limits of reverse-engineering.

Image Scanning and OCR

Another kind of data conversion is document scanning or image scanning. It is the action or process of converting text and graphic paper documents, photographic film, photographic paper or other files to digital images.

The images can be converted into editable texts by using OCR technology. The accurate recognition of typewritten text is now considered largely a solved problem but recognition of hand printing, cursive handwriting, and even the printed typewritten versions of some other scripts (especially those with a very large number of characters), are still the subject of active research.


Data entry keyers usually input lists of items, numbers, or other data into computers or complete forms that appear on a computer screen. They also may manipulate existing data, edit current information, or proofread new entries into a database for accuracy. Some examples of data sources include customers’ personal information, medical records, various registration forms, and membership lists. Usually, this information is used internally by a company and may be reformatted before other departments or customers utilize it.


In addition to being affected by technology, employment of data entry and information processing workers will be adversely affected by businesses that are increasingly contracting out their work. Many organizations have reduced or even eliminated permanent in-house staff - for example, in favor of temporary employment and staffing services firms. Some large data entry and information processing firms increasingly employ workers in countries such as China with relatively lower wages. As international trade barriers continue to fall and telecommunications technology improves, this transfer of jobs will mean reduced demand for data entry keyers in the developed countries such as United States, UK and Australia.



Data processing means manipulation of data by a computer. It includes the conversion of raw data to machine-readable form, flow of data through the CPU and memory to output devices, and formatting or transformation of output. Any use of computers to perform defined operations on data can be included under data processing. In the commercial world, data processing refers to the processing of data required to run organizations and businesses efficiently.

Used specifically, data processing may refer to a discrete step in the information processing cycle in which data is acquired, entered, validated, processed, stored, and output, either in response to queries or in the form of routine reports; the processing is the step that organizes the information in order to form the desired output. Used in a more general sense, data processing may also refer to the act of recording or otherwise handling one or more sets of data, and is often performed with the use of computers. The word data is commonly used to mean "information" and often suggests large amounts of information in a standardized format. Data may consist of letters, numbers, equations, dates, images, and other material, but does not usually include entire word.

Data Processing Keywords

Data Processing vs Information Processing

One of the key points about the term data processing is that it is used by some synonymously with the term information processing, while others make a distinction between the two. In most cases, however, the word data is used to mean the original form of the output, while information is meant to define data that has been organized or altered in some way. In other words, a distinction is being made between "data", which is raw material, and "information", which has undergone some processing: the two are being treated as distinct, with data preceding information.

Data to Information Processing


In cryptography, encryption is the process of obscuring information to make it unreadable without special knowledge. Encryption has been used to protect communications for centuries, but only organizations and individuals with an extraordinary need for secrecy had made use of it. In the mid-1970s, strong encryption emerged from the sole preserve of secretive government agencies into the public domain, and is now used in protecting widely-used systems, such as Internet e-commerce, mobile telephone networks and bank automatic teller machines.

Encryption can be used to ensure secrecy, but other techniques are still needed to make communications secure, particularly to verify the integrity and authenticity of a message; for example, a message authentication code (MAC) or digital signatures. Another consideration is protection against traffic analysis.

In cryptography, a cipher (or cypher) is an algorithm for performing encryption and decryption — a series of well-defined steps that can be followed as a procedure. An alternative term is encipherment. In most cases, that procedure is varied depending on a key which changes the detailed operation of the algorithm. In non-technical usage, a "cipher" is the same thing as a "code"; however, the concepts are distinct in cryptography. In classical cryptography, ciphers were distinguished from codes, which operated by substituting according to a large codebook.

The original information is known as plaintext, and the encrypted form as ciphertext. The ciphertext message contains all the information of the plaintext message, but is not in a format readable by a human or computer without the proper mechanism to decrypt it; it should resemble random gibberish to those not intended to read it.The operation of a cipher usually depends on a piece of auxiliary information, called a key or, in traditional NSA parlance, a cryptovariable. The encrypting procedure is varied depending on the key, which changes the detailed operation of the algorithm. A key must be selected before using a cipher to encrypt a message. Without knowledge of the key, it should be difficult, if not impossible, to decrypt the resulting ciphertext into readable plaintext.

Encryption Workflow


In the course of our lives we fill out hundreds of forms - application forms, questionnaires, insurance claims, etc. At the same time computers have become indispensable for collecting and managing information, making the task of extracting data from printed documents even more pressing.

Forms processing is a process whereby information entered into data fields is converted into electronic form: * entered data are “captured” form their respective fields * forms themselves are digitized and saved as images.

Forms Processing WorkFlow


Intelligent Character Recognition (ICR) is an advanced version of OCR that allows font and different styles of hand writing to be learned during processing to improve accuracy in recognition levels.

Most of the good ICR software have a self-learning kind of system referred to as a neural network, which automatically updates the recognition database for new hand writing patterns.

It basically extends the usefulness of scanning devices for the purpose of document processing , from printed character recognition (a function of OCR) to hand-written matter recognition.

Because this process is involved in recognising hand writing, accuracy levels may, in some circumstances, not be very good but can achieve 97%+ accuracy rates in reading handwriting in structured forms.

Often to achieve these high recognition rates several read engines are used within the software and each is given elective voting rights to determine the true reading of characters.

In numeric fields engines which are designed to read numbers take preference while in alpha fields engines designed to read hand written letters have higher elective rights. When used in conjunction with a bespoke interface hub, hand written data can be automatically populated into a back office system avoiding laborious manual keying and can be more accurate than traditional human data

ICR Workflow

An important development of ICR was the invention of Automated Forms Processing in 1993. This involved a three stage process of capturing the image of the form to be processed by ICR and preparing it to enable the ICR engine to give best results, then capturing the information using the ICR engine and finally processing the results to automatically validate the output from the ICR engine.

This application of ICR increased the usefulness of the technology and made it applicable for use with real world forms in normal business applications.


OCR Workflow

Optical character recognition, usually abbreviated to OCR, is computer software designed to translate images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text, or to translate pictures of characters into a standard encoding scheme representing them (e.g. ASCII or Unicode).

OCR began as a field of research in pattern recognition, artificial intelligence and machine vision. Though academic research in the field continues, the focus on OCR has shifted to implementation of proven techniques.

Optical character recognition (using optical techniques such as mirrors and lenses) and digital character recognition (using scanners and computer algorithms) were originally considered separate fields. Because very few applications survive that use true optical techniques, the optical character recognition term has now been broadened to cover digital character recognition as well.

Early systems required "training" (essentially, the provision of known samples of each character) to read a specific font. Currently, though, "intelligent" systems that can recognize most fonts with a high degree of accuracy are now common.

Some systems are even capable of reproducing formatted output that closely approximates the original scanned page including images, columns and other non-textual components.

The accurate recognition of Latin-script, typewritten text is now considered largely a solved problem.

Recognition of hand printing, cursive handwriting, and even the printed typewritten versions of some other scripts (especially those with a very large number of characters), are still the subject of active research.


PGP Encryption (Pretty Good Privacy) is a computer program which provides cryptographic privacy and authentication.To the best of publicly available information, there is no known method for any entity to break PGP encryption by cryptographic, computational means regardless of the version being employed.

PGP email encryption uses asymmetric key encryption algorithms that use the public portion of a recipient's linked key pair, a public key, and a private key.The sender uses the recipient's public key to encrypt a shared key (aka a secret key or conventional key) for a symmetric cipher algorithm.

That key is used, finally, to encrypt the plaintext of a message. Many PGP users' public keys are available to all from the many PGP key servers around the world which act as mirror sites for each other.

The recipient of a PGP encrypted email message decrypts it using the session key for a symmetric algorithm.That session key is included in the message in encrypted form and was itself decrypted using the recipient's private key.

Use of two ciphers in this way is sensible because of the very considerable difference in operating speed between asymmetric key and symmetric key ciphers (the differences are often 1000+ times).

This operation is completely automated in current PGP desktop client products.PGP Workflow

PGP Workflow

Official website:


In public key cryptography, the private key is generally kept secret, while the public key may be widely distributed. In a sense, one key "locks" a lock; while the other is required to unlock it. It should not be possible to deduce the private key of a pair given the public key.

There are many forms of public key cryptography, including:

* public key encryption — keeping a message secret from anyone that does not possess a specific private key.

* public key digital signature — allowing anyone to verify that a message was created with a specific private key.

* key agreement — generally, allowing two parties that may not initially share a secret key to agree on one.

Typically, public key techniques are much more computationally intensive than purely symmetric algorithms, but the judicious use of these techniques enables a wide variety of applications. One analogy is that of a locked store front door with a mail slot.The mail slot is exposed and accessible to the public; its location (the street address) is in essence the public key.

Anyone knowing the street address can go to the door and drop a written message through the slot. However, only the person who possesses the matching private key, the store owner in this case, can open the door and read the message.

Public Key Encryption Workflow


In a study conducted by John Peddie results show a staggering 42% improvement on productivity with the use of dual monitors. Another study, this time by an NEC commissioned research team at the University of Utah found that people who used two 20-inch monitors were 44 percent more productive at certain text-editing operations compared with people using a single 18-inch monitor.

In our production centre, we use dual-monitors for some of proejct teams to increase productivity. One monitor is displaying the source image while another monitor is displaying the data entry interface. In this way, the typists can concentrate on looking at the images while typing into the interface very smoothly.

Also, web capture jobs would sometimes require copy pasting of data from a source to a destination database (in most cases) and working on a 19 inch screen presents a big limitation on what you can do and how much space you can allocate for the source and for the destination window.You simply couldn’t move as freely as you wish and the Alt Tab short cut to open a previously opened window is becoming too straining to the eyes.

Thus productivity is hindered by a big margin. In this case, dual-monitor approach can greatly save the operator's time. One monitor can display the web data while another can display the target database or Excel file. The operators can easily do copy and paste job in a more efficient way.


Data destruction software, sometimes called data sanitization software, disk wipe software, or hard drive eraser software, is a software-based method of completely erasing the data from a hard drive.

Data destruction software is just one of several ways to completely erase a hard drive.

When you delete files and then empty the Recycle Bin, you don't actually erase the information, you just delete the reference to it so the operating system can't find it. All the data is still there and, unless it's overwritten, can be easily recovered using file recovery software.

Data destruction software, however, truly does erase the data. Each data destruction program utilizes one or more data sanitization methods that can permanently overwrite the information on the drive.

If you need to remove all traces of a virus or you're planning on recycling or disposing of your hard drive, wiping your hard drive using data destruction software is the best way to protect yourself.

A data sanitization method is the specific way in which a data destruction program overwrites the data on a hard drive or other storage device.

DoD 5220.22-M is a software based data sanitization method used in various data destruction programs to overwrite existing information on a hard drive or other storage device.

Erase Data

The DoD 5220.22-M data sanitization method is usually implemented in the following way:

Pass 1: Writes a zero (0) and verifies the write

Pass 2: Writes a one (1) and verifies the write

Pass 3: Writes a random character and verifies the write

The DoD 5220.22-M sanitization method was originally defined by the US National Industrial Security Program (NISP) in the National Industrial Security Program Operating Manual (NISPOM) and is one of the most common sanitization methods used in data destruction software.

Erasing files on a hard drive using the DoD 5220.22-M data sanitization method will prevent all software based file recovery methods from lifting information from the drive and should also prevent most if not all hardware based recovery methods.


Data collection means collecting data from various sources such as offline resources, websites, business directories, job postings, search engines and then convert the gathered data into your preferred output template.

Data collection is an important facet in maintaining the financial strength of your business. With the help of data collection you can have ready access to any information you might require to help you with your decision making process.

Data collection is a time-consuming process involving both human and sophisticated software tools which is normally being outsourced to professional data processing company like Keyoung.

vAs we uniquely combine the latest technology with the skilled data collection experts, we are able to offer dedicated data collection services with competitive rates.


The Changjie input method is a system by which Chinese characters can be entered into a computer using a standard keyboard.

Changjie is based on the graphological aspect of the characters: each basic, graphical unit is represented by a basic character component, 24 in all, each mapped to a particular letter key on a standard QWERTY keyboard. An additional "difficult character" function is mapped to the X key.

Within the keystroke-to-character representations, there are four subsections of characters: Philosophical Set (corresponding to the letters 'A' to 'G' and representing the sun, the moon, gold, wood, water, fire and the earth), Strokes Set (corresponding to the letters 'H' to 'N' and representing the brief and subtle strokes), Body-Related Set (corresponding to the letters 'O' to 'R' and representing human, heart, hand and mouth) and Shapes Set (corresponding to the letters 'S' to 'Y' and representing complex and encompassing character forms).

Chinese Keyboard

Changjei - Philosophical Set

Changjei - Stroke Set

Changjei - Body Related Set

Changjei - Shapes Set


IPTC data is a method of storing textual information in images defined by the International Press Telecommunications Council. It was developed for press photographers who need to attach information to images when they are submitting them electronically but it is useful for all photographers. It provides a standard way of storing information such as  keywords, location and captions.

Because the information is stored in the image in a standard way this information can be accessed by other IPTC aware applications and the images become "searchable" by Windows Explorer (Windows) or Spotlight (Mac) no matter where they are stored in the hard drive. As  photo or image tagging is both time consuming and labor intensive when the volume is large, outsourcing of this kind of job is cost effective.

IPTC Meta Data Example


BPO stands for "Business Process Outsourcing," which is simply another term for outsourcing. This is when a company contracts an outside provider for services or business processes. This might include manufacturing or back-office functions such as data entry, data processing, credit card processing, accounting and human resources. But BPO might also includes front-end services such as customer care and technical support.

"Global BPO" is another term for offshoring or outsourcing outside a company's home country or primary market.

BPO Method


Outsourcing is when a company contracts with an outside provider for services or other business processes, rather than employing staff to do these services in-house. These services may be provided on-site or off-site. Typically outsourcing is done with an eye toward efficiency and cost-saving for the company. Outsourcing could be as simple as hiring a freelancer to edit a company newsletter or as large-scale as hiring an outsourcing company to handle all credit card issuing activities in a bank.

Offshoring is a form of outsourcing. Offshoring is when a company moves business processes or services to a country other than its home country or primary marketplace. This is usually done in an effort to cut costs. Typically the new country has lower labor costs.

Off-shoring vs Outsourcing