Home » Programs » Chinese Longitudinal Healthy Longevity Survey (CLHLS) » Study Population » Data Collection & Quality Control Procedures

Data Collection & Quality Control Procedures

Data collection

1. Age validation

Age reporting is a crucial issue in the study of healthy longevity. Age exaggeration can cause an underestimation of mortality rates at higher ages (Coale and Li 1991). Several methods have been used to validate the respondent’s age in the CLHLS. First, we employed user-friendly forms to convert the reported birth dates of the Chinese lunar calendar into the Western calendar. The CLHLS asks for date of birth (rather than age directly) and computes the respondent’s age after the survey by subtracting it from the date of the survey, because the Chinese system of calculating nominal age may make the response ambiguous. Second, we use other information relevant to the date of birth such as genealogical record, ID card, and household registration booklet to validate the sampled elder’s age. Third, the interviewers and supervisors also check the sampled person’s parents’ age, sibling’s age, and the children/grandchildren’s age, the age at marriage, at birth, and so forth, to further validate age reporting. Fourth, an additional question was designed for each interviewer to provide his/her judgment on the validity of the sampled person’s age in the interviewer section in the CLHLS questionnaire. Fifth, if the sampled person reported her/his age to be over 105, the interviewer was instructed to obtain additional evidence or concurrence from the local residential committee and local aging committee. If any inaccuracy in the reported age or any other logical problem in the questionnaire was found, a re-interview or phone call regarding specific questions was conducted. (Please refer to Technical Documentation for details.)

2. Health outcomes

The CLHLS collected data on health conditions from a variety of dimensions, mainly focusing on the following domains.

(1) Activities of Daily Living (ADL)

Six items from the Katz Index (Katz et al. 1983) including bathing, dressing, indoor transferring, toileting, eating, and continence are collected for each person in each wave. These six questions are in the first part of Section E of the questionnaire for survivors. The variable names are E1 to E6 and are consistent across waves.

(2) Instrumental Activities of Daily Living (IADL)

Eight items including ability to visiting neighbors, lifting 5kg weights, laundry, cooking, etc. measure IADLs. These eight questions were not collected in the 1998 and 2000 waves. They are located in the rare part of Section E of the questionnaire for survivors. The variable names are E7 to E14 and consistent in the 2002 and 2005 waves.

(3) Functional limitations

The CLHLS designed the following questions to measure functional limitations: Put hand at low back, put hand behind neck, stand up from sitting in a chair, pick a book from floor, and turn around. An additional question on lifting hand was added into the questionnaire since 2002. These questions are located in the middle part of Section G of the questionnaire for survivors. Except for the 1998 wave where the variables were G10, G11, G13 and G14, all other waves have the consistent variable names G8, G9, G11, and G12.

(4) Cognitive function

The CLHLS measures cognitive function using Mini-Mental Status Examination (MMSE) (Folstein, Folstein, and McHugh 1975). The MMSE consists of five major domains of cognitive function: Orientation, registration, calculation, recall, and language. There are 24 questions. Except for the question asking respondents to name types of food in a minute, which has a possible score of 7, each question has a score of one if the answer is correct and a score of zero if incorrect. The total possible score of MMSE is 30, with lower scores indicating poor cognitive ability. All questions MUST be answered by sampled elders themselves. In other words, no proxy is allowed. If the sampled elder had any difficulty answering the cognitive function questions, the interviewer must chose a reason to explain why the sampled person was unable to answer. The variable names of questions are in Section C of the questionnaire for survivors.

The Chinese version of MMSE tries to meet the cultural and socioeconomic conditions in China and make the questions easily understandable and practically answerable among normally functioning oldest-old Chinese (Zeng &Vaupel 2002). Several similar versions of the Chinese MMSE which are all adapted from Folstein et al. (1975), have been proven to be reliable and valid among Chinese old populations (Shyu and Yip 2001; Yu et al. 1989; Zhang et al. 2006). According to Folstein et al. (1975), four categories of cognitive function may be classified: unimpaired (with a score of 24-30), slightly impaired (with a score of 18-23), moderately impaired (with a score of 10-17) and severely impaired (0-9). Some researchers suggest using the score of 18 as the cut-off point between cognitively unimpaired and impaired for those populations with less educated, such as the Chinese elderly (see Zhang 2006). The coding of cognitive function fully depends on research goals and preference. Some researchers use a dichotomized category, while others use continuous score, and still others use ordered codes. Each classification has its pros and cons.

(5) Self-rated health

The CLHLS asks each sampled elder a single question about his/her self-rated health with five categories: very good, good, so so, poor, and very poor. The question is in Section B of the questionnaire for survivors. This question MUST be answered by the sampled person, and no proxy is allowed. If the sampled person was unable to answer, a response of “unable to answer” was given by the interviewer

(6) Interviewer-rated health

The CLHLS further collects data about interviewer rated health for each of his/her visited elders with four categories: surprisingly healthy (almost no obvious ailments), relatively healthy (only minor ailments), moderately ill (moderate degrees of major ailments or illnesses),and very ill (major ailments or diseases, bedridden, etc.). The question is in Section H of the questionnaire for survivors.

(7) Self-reported chronic diseases

The CLHLS collects data on nearly twenty chronic diseases from each sampled person. The number of diseases collected in each wave is expanding . This set of questions is in Section G of the questionnaire for survivors. The variable names are not consistent across waves. Please refer to the questionnaire.

The CLHLS also collects information on diseases suffered from before dying from the next-of-kin for those who died between survey intervals. These disease questions are located in middle part of the questionnaires for deceased person.

As indicated in the technical reports, the surviving interviewees’ self-reported disease prevalence rates are relatively reliable as compared to other Chinese nationwide surveys.

3. Lifestyle

The CLHLS gathers various data on both current and past lifestyles including diet, smoking, alcoholic consumption, exercise, religious participation, and other leisure activities. The variables are located in Section D of the questionnaire for survivors.

4. Marriage history

The marriage history of each ever married person in the sample was collected including age at each marriage transition, quality of marriage, and type of event ending in a marriage (i.e., divorced or widowed). The data for those who have had more than four marital transitions were truncated to four since almost nobody had more than four marital transitions.

If the previously interviewed sample was re-interviewed in the subsequent wave(s), his/her marriage history data were not re-collected. The variables are consistent across waves and located in Section F (F43 set) of the questionnaire for the survivors.

5. Sibling information

The CLHLS collects information on birth order, sex, current survival status, current age (or age at death if dead), distance of residence, and frequency of visiting for respondents’ siblings. If the previously interviewed sample was re-interviewed in the subsequent wave(s), his/her sibling information was not re-collected although there might have some changes (e.g., survival status or residence). This is a drawback of the CLHLS (please see technical report for details). The sibling variables are consistent across waves and located in Section F (F92 set) of the questionnaire for survivors. (note: the number of variables for inclusion of siblings is different across waves.)

6. Children and fertility history

The CLHLS gathers information on sex, current survival status, current age (if dead, convert the age at death into current age), distance of residence, and frequency of visiting for respondents’ children. If the previously interviewed sample was re-interviewed in the subsequent wave(s), his/her children’s information was not re-collected although there might have some changes (e.g., survival status or residence). This is a drawback of the CLHLS (please see technical report for details). The children’s variables are not consistent across waves and located in Section F (F101 set for the 1998 wave and F103 set for later waves) of the questionnaire for survivors. (note: some wordings are also different across waves, please refer to the codebooks for details.)

7. Physiological indicators

A medical student or nurse accompanies with an interviewer to conduct each of in-home, face-to-face interviews. The medical personnel checks blood pressure, heart rate, heart rhythm, and other basic physical checkup for each sampled person. Questions are almost consistent across waves with few exceptions in the 1998 wave. These data are located in Section G of the questionnaire for the survivors.

8. Living arrangement

The CLHLS asks whether the sampled elder is currently living alone, with family member(s), or in an institution. If the sampled elder is coresiding with family member(s), the age, sex, and relationship with the sampled elders of each of members was asked so that researchers can identify the living arrangement of each sampled elder. If the sampled person is living in an institution, the date of admission, expenditure information was further collected in the 2005 survey. The living arrangement variables are consistent across waves and are located in Section A (A53 set) of the questionnaire for survivors. (note: education of each coresident was collected in the 2005 wave.)

The CLHLS also collects information on living arrangement before dying for those died between survey intervals in the questionnaire for the deceased persons . The living arrangement before dying for the deceased person is based on a single question with several categories. The living arrangement variable before dying is located in the first part of the questionnaire for the deceased persons.

9. Care needs/costs, end of life care

In the 2005 wave, the CLHLS started to collect care needs/costs related to ADL for survivors, and care needs and costs in the last year of the life for persons who died between survey intervals. The questions on care needs/costs are located in Section E in the questionnaire for survivors, while they are located in the last part of the questionnaire for the deceased persons.
Please refer to codebooks for details.

Data quality control

1. Procedures used to ensure high-quality data collection

Emphasis on data quality during the training process. All supervisors and interviewers were informed at the national and provincial training workshops that the quality of their data collection work will be evaluated by careful checks on internal and logical consistencies. The checks will be carried out by the enumerators themselves, supervisors, MIG staff, and research team members; computer programs will check internal and logical consistencies, as well. The results were used to evaluate the work of each supervisor and interviewer. It was stated clearly at the training workshops that the interviewees will be re-interviewed three years later by another interviewer. The follow-up surveys will be used to evaluate the quality of the previous interviews.

Strict accountability system. After the completion of each interview, the interviewers were required to carefully check all recorded answers, correct any possible errors and conduct re-questioning, if necessary. The interviewers were asked to sign their names on the front page of the questionnaires to be accountable for the quality of the data collection. The supervisor responsible for the survey area carefully reviewed each filed questionnaire, signed it if it passed the review, or returned it to do again, if necessary.

Easily understandable questionnaire and survey instruction booklet. We devoted serious efforts to minimize possible confusion about the questionnaire. An exact definition of each question and how to perform the objective examinations, etc., were described in detail in the carefully prepared training DVD and the easily understandable survey instruction booklet.

Post-interview check and rewards. MIG headquarters in Beijing and its provincial offices called a randomly selected 50% of the interviewees to check the interviewers’ work. If there was no telephone in the interviewee’s home, the call was addressed to the village or street neighborhood committee office. After all of the post-interview reviews described above were performed, CHAFS and MIG jointly issued reward certificates plus a modest bonus to those supervisors and interviewers who performed the highest quality work. Such rewards were issued after our 1998, 2000, 2002, and 2005 surveys and proved to be powerful motivational tools.

2. Data uploading/entry

Due to funding limitations, the CLHLS did not use CAPI (computer-assisted personal interviewing) technology to collect the data. Instead, we used paper questionnaire and generated databases by uploading the filed questionnaire data using program of the Epi 2000 program which was developed by the Center for Disease Control and Prevention (CDC). The databases were developed by some research team members who have rich expertise in the field. In designing the database, all possible logical relations were considered. If there is any logical error, the computer system will give warning messages so that data personnel uploading the data are aware of the problem. If the problem is not a typo, the data uploading supervisor sends the report to the China Mainland Information Group (MIG), who mainly conducted the field interviews, and MIG calls the interviewees to fix the problem. To ensure high quality, each questionnaire was double up-loaded by two different persons. Then, the two records were compared and any typos were corrected.

3. Data cleaning

After the completion of data uploading, the CLHLS team will conduct the longitudinal checks across survey waves. If there is a logical error, the CLHLS will inform the MIG to double check the results and make phone calls to validate the data.

4. Data assessments

After the data uploading and cleaning, the CLHLS research team conducts extensive assessment for the data quality of each wave. Please see relevant technical reports.