*************************************************************************************************************************. * Data without Boundaries - DwB. * 2nd Training Course on EU?LFS. * September 17th to 19th 2014, Ljubljana * Practical computing session. * EU-LFS training dataset. * based on EU-LFS 2012, yearly data, integrated file of all countries. * subsample of 5000 cases per country. * 21 countries: AT, BE, CH, CZ, DE, EE, ES, FR, HU, GR, IE, IT, LT, LU,NL, RO, SI SK, PL, PT, UK. * J.. (ADP). *----------------------------------------------------------- Preparation -----------------------------------------------------------. * specify the path where the EU-LFS training dataset is stored. FILE HANDLE data_path / NAME='F:\tekoci_projekti_adp\ads\EU_LFS_training_Ljubljana\eulfs_suf\training_data_set\'. * specify the path where you want to save your data. FILE HANDLE mydata_path / NAME='F:\tekoci_projekti_adp\ads\EU_LFS_training_Ljubljana\eulfs_suf\training_data_set\Exercise_0'. * open dataset. GET FILE='data_path/LFS_2012y_td.sav'. DATASET NAME DataSet1 WINDOW=FRONT. DATASET ACTIVATE DataSet1. * only working age respondents shall be included. SELECT IF age>=17 AND age<=72. EXECUTE. * only respondents living in private households. SELECT IF hhpriv=1. EXECUTE. *------------------------------------------------------------------------------------------------------- *------------------------- Exercise I. Explore the data set -------------------------. *------------------------------------------------------------------------------------------------------- *------------------------- Start the analysis by checking the structure of the data file . * check: 81623 cases in total. FREQ country. freq REFYEAR QUARTER . * see variables contained. display sorted DICTIONARY / VARIABLES= all. * if you omit 'sorted' option, you get the variables in order of the data file. . * e.g. display DICTIONARY / VARIABLES= all. * Hint: try to see the distribution of some of the variables. * e.g. freq STAPRO SUPVISOR SIZEFIRM YSTARTWK FTPT TEMP WSTATOR . . *-------------------------------------------------------------------------------------------------------. *------------------------- Exercise II. Prepare the working data set -------------------------. *-------------------------------------------------------------------------------------------------------. *------------------------- a) Select the relevant population you need to work with. freq WSTATOR . * You can either type the command into syntax or use the menu Data-->Select... and then paste and execute it. DATASET COPY working_pop. DATASET ACTIVATE working_pop. FILTER OFF. USE ALL. select if (WSTATOR = 1 or WSTATOR=2) . EXECUTE. *Check the result. freq WSTATOR . freq country. *------------------------- b) while comparing countries you may wish to obtain equal sample size of the selected population. * if preexisting wight variable exists, use it, else create one. compute COEFF=1. * make the weight active. WEIGHT by coeff. *Check the result. With the COEFF=1 nothing should happen. freq country. * obtain the values for the new weight coeff that will adjust sample size. * for safety reasons, some commands require file to be sorted on key variables, therefore we will sort by country. SORT CASES BY COUNTRY. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /PRESORTED /BREAK=COUNTRY /N_BREAK=N. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /PRESORTED /N_tot=N. freq N_tot. cross N_BREAK by country. *produce weight coeff in order to have equal sample size: N_tot/ number of countries . compute coeff= coeff*((N_tot/21)/N_BREAK). *Check the result. all sample size have to be equal. . freq country. *-------------------------------------------------------------------------------------------------------. *------------------------- Exercise III. Precarious employment in different countries -------------------------. *-------------------------------------------------------------------------------------------------------. *-----------------------------III. b) prepare some variables for further descriptive analysis-----------------------------. freq STARTIME temp ftpt . * create dummy variables. RECODE TEMP (2=1) (else =0) INTO temp_lm. recode ftpt (1=1) (2=0) (MISSING=SYSMIS) INTO FT. VARIABLE LABELS temp_lm 'Limited_duration_dummy'. VARIABLE LABELS FT 'Full_time_dummy'. EXECUTE. *check. cross temp by temp_lm. cross ftpt by ft. freq ft temp_lm. * prepare age for descriptive analysis. RECODE age (17 22=20) (27 =27) (32 37=35) (42 THRU 52= 47) (57 thru 72=65) INTO age5. var lab age5 'Lifecycle - 5 groups seniority levels (recode age)'. val lab age5 20 'up to 22 years old' 27 'up to 29 years old' 35 'up to 40 years old' 47 'up to 54 years old' 65 'up to 72 years old' . format age5 (f2.0). freq age5. * create dummy variables. recode sex (1=1) (else=0) into sex_male . cross sex by sex_male. freq sex_male . *---------------------------- III. c) use a limited set of countries for country level oriented exploratory analysis--------- *------Note that we will save the current data set with all the countries for further analysis at the end of session . DATASET COPY country_sel. DATASET ACTIVATE country_sel. FILTER OFF. USE ALL. SELECT IF (COUNTRY= 7 | COUNTRY=15 | COUNTRY=16 | COUNTRY=18 | COUNTRY=23 | COUNTRY=25 | COUNTRY=27 | COUNTRY=29 | COUNTRY=31). EXECUTE. * chec select. freq country. *Select countries that have representatives among workshop participants. It is more practical to do the exploratory analysis on a limited set of countries. means STARTIME temp_lm ft sex_male by country /CELLS MEAN MEDIAN COUNT STDDEV /STATISTICS ANOVA . *Select countries that have representatives among workshop participants. It is more practical to do the exploratory analysis on a limited set of countries. MEANS TABLES=STARTIME by COUNTRY BY age5 sex temp ftpt /CELLS MEAN MEDIAN COUNT STDDEV. * DATASET ACTIVATE country_sel. *---------------------------- III. d) display separate analysis by country-------- * . SPLIT FILE LAYERED BY COUNTRY. MEANS TABLES=STARTIME by age5 sex temp ftpt /CELLS MEAN MEDIAN COUNT STDDEV /STATISTICS ANOVA . corr STARTIME with temp_lm ft sex_male . regression VARIABLES = STARTIME sex_male age temp_lm ft /DEPENDENT STARTIME /METHOD enter sex_male age temp_lm ft . split file off. * we don't need this data set anymore. DATASET CLOSE country_sel. *-------------------------------------------------------------------------------------------------------. *------------------------- IV. Including the macro level variable into explanation -------------------------. *-------------------------------------------------------------------------------------------------------. *----------------------------- IV. a) Aggregate the information from individual level data into country level-----------------------------. * pull country rate of unemployment out of data. * Open original data. DATASET ACTIVATE DataSet1. freq ILOSTAT . freq country. * Create dummy unemploy. Recode ILOSTAT (2= 1) (1=0) (else=sysmiss) into unemploy. * check. cross ilostat by unemploy / missing = include. * display. means unemploy by country /CELLS MEAN COUNT /STATISTICS ANOVA. * put unemploy rates to table. DATASET DECLARE unemploy_mean. * if not presorted it' requred to sort. SORT CASES BY COUNTRY. AGGREGATE /OUTFILE='unemploy_mean' /PRESORTED /BREAK=COUNTRY /unemploy_mean=MEAN(unemploy). DATASET ACTIVATE unemploy_mean . *see result. list . * open the working_pop data set. DATASET ACTIVATE working_pop . *note that all countries are included. freq country. * add the values from the table on the country level. MATCH FILES /FILE=* /TABLE='unemploy_mean' /BY COUNTRY. EXECUTE. *check. means unemploy_mean by country /CELLS MEAN COUNT STDDEV /STATISTICS ANOVA. *-------------------------------------------------------------------------------------------------------. *----------------------------- IV.b) Perform regression analysis that includes the macro level variable *-------------------------------------------------------------------------------------------------------. * include macro_level variable into regression. regression VARIABLES = STARTIME sex_male age temp_lm ft unemploy_mean /DEPENDENT STARTIME /METHOD enter sex_male age temp_lm ft unemploy_mean . *finish. DATASET CLOSE working_pop. DATASET CLOSE DataSet1. DATASET CLOSE unemploy_mean. DATASET ACTIVATE DataSet1.