Creating a regression table in stata

In my last post, I showed you how to create a table of statistical tests using the command() option in the new and improved table command. In this post, I will show you how to gather information and create tables using the new collect suite of commands. Our goal is to fit three logistic regression models and create the table in the Adobe PDF document below.

graph1

Create the basic table

Let’s begin by typing webuse nhanes2l to open the NHANES dataset and then typing describe to examine some of the variables.

. webuse nhanes2l (Second National Health and Nutrition Examination Survey) . describe highbp age sex diabetes Variable Storage Display Value name type format label Variable label ------------------------------------------------------------------------------- highbp byte %8.0g * High blood pressure age byte %9.0g Age (years) sex byte %9.0g sex Sex diabetes byte %12.0g diabetes Diabetes status

The dataset includes age, sex, an indicator for high blood pressure (highbp), and an indicator for diabetes (diabetes).

A new strategy for building tables

We will fit three logistic regression models for the binary outcome highbp. For each model, we will use the logistic command to estimate the odds ratios and standard errors. Then we will use estat ic to estimate the Akaike’s information criterion (AIC) and Schwarz’s Bayesian information criterion (BIC) for each model. Our final table will include information for three models from six different commands.

Given the relative complexity of our table, we are going to use a new strategy to build it. We will use collect get to gather information from each command. Then we will use collect layout to define the layout of our table. Let’s do a simple example to illustrate this strategy before we begin the full table.

Let’s type collect get: before our first logistic regression command.

. collect get: logistic highbp c.age i.sex Logistic regression Number of obs = 10,351 LR chi2(2) = 1563.54 Prob > chi2 = 0.0000 Log likelihood = -6268.9975 Pseudo R2 = 0.1109 ------------------------------------------------------------------------------ highbp | Odds ratio Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- age | 1.049042 .0013945 36.02 0.000 1.046313 1.051779 | sex | Female | .648767 .0280172 -10.02 0.000 .5961141 .7060706 _cons | .0887874 .0063561 -33.83 0.000 .0771641 .1021615 ------------------------------------------------------------------------------ Note: _cons estimates baseline odds.

Next let’s type collect layout to define the layout of a table with row dimension colname and column dimension result. I have used square brackets to include only the levels _r_b and _r_se from the dimension result. This will add columns for the coefficients and standard errors, respectively.

. collect layout (colname) (result[_r_b _r_se]) Collection: default Rows: colname Columns: result[_r_b _r_se] Table 1: 4 x 2 ------------------------------------ | Coefficient Std. error ------------+----------------------- Age (years) | 1.049042 .0013945 Male | 1 0 Female | .648767 .0280172 Intercept | .0887874 .0063561 ------------------------------------

That was easy! We created a basic table of regression output with two commands. The output tells us that collect get created a new collection named default.

Let’s repeat this strategy and add some options. Let’s begin by typing collect clear to clear any collections from Stata’s memory. Then let’s use collect create to create a new collection named MyModels.

collect clear collect create MyModels

Next let’s use the collect get option name() to gather the results from our logistic regression model in our collection named MyModels. Note that I have typed collect rather than collect get. The word “get” is not necessary.

collect, name(MyModels): logistic highbp c.age i.sex

I would also like to specify a new dimension and level for the results of my logistic regression model. I can do this using the tag() option. The basic syntax is tag(dimension[level]). The example below stores the results of the logistic regression model to the level (1) in the dimension model.

collect, name(MyModels) /// tag(model[(1)]) /// : logistic highbp c.age i.sex

The example above stores all the results from the model. But we will only need the coefficients and standard errors. We can specify a list of results to be automatically reported in the table by including those results after collect. The example below collects only the coefficients (_r_b) and the standard errors (_r_se) from the logistic regression model.

collect _r_b _r_se, /// name(MyModels) /// tag(model[(1)]) /// : logistic highbp c.age i.sex

Now we can use collect layout to create a table from the results we stored in level (1) of dimension model in the collection MyTables.

. collect layout (colname#result) (model[(1)]), name(MyModels) Collection: MyModels Rows: colname#result Columns: model[(1)] Table 1: 12 x 1 ------------------------ | (1) --------------+--------- Age (years) | Coefficient | 1.049042 Std. error | .0013945 Male | Coefficient | 1 Std. error | 0 Female | Coefficient | .648767 Std. error | .0280172 Intercept | Coefficient | .0887874 Std. error | .0063561 ------------------------

You may be wondering how I selected the row and column dimensions for collect layout. I could explain why this particular example worked. But it may not work for your tables. So let’s walk through the steps I used to figure it out.

Some details about collect layout

Let’s begin by typing collect dims to view a list of the dimensions in our collection.

. collect dims Collection dimensions Collection: MyModels ----------------------------------------- Dimension No. levels ----------------------------------------- Layout, style, header, label cmdset 1 coleq 1 colname 8 colname_remainder 1 model 1 program_class 1 result 43 result_type 3 rowname 1 sex 2 Style only border_block 4 cell_type 4 -----------------------------------------

The dimension result catches my eye because of the name and because it has 43 levels. We can view a list of the levels by typing collect levelsof result.

. collect levelsof result Collection: MyModels Dimension: result Levels: N N_cdf N_cds _r_b _r_ci _r_df _r_lb _r_p _r_se _r_ub _r_z chi2 chi2type cmd cmdline converged depvar df_m estat_cmd ic k k_dv k_eq k_eq_model ll ll_0 marginsnotok marginsok ml_method mns opt p predict properties r2_p rank rc rules technique title user vce which

The dimension result contains estimates of the coefficients, standard errors, and many other statistical results from our model. Let’s use collect layout to create a table for the dimension result.

. collect layout (result), name(MyModels) Collection: MyModels Rows: result Your layout specification does not uniquely match any items. Dimension colname might help uniquely match items.

That didn’t work. But the output suggests that including the dimension colname might help. The dimension named colname has eight levels and we can view a list of the levels by typing collect levelsof colname.

. collect levelsof colname Collection: MyModels Dimension: colname Levels: age 1.sex 2.sex c1 c2 c3 c4 _cons

The dimension colname includes the variable names, including factor variables, from our logistic regression model. It also contains levels named c1, c2, c3, and c4. Let’s add the row dimension colname and see what happens.

. collect layout (colname) (result), name(MyModels) Collection: MyModels Rows: colname Columns: result Table 1: 4 x 2 ------------------------------------ | Coefficient Std. error ------------+----------------------- Age (years) | 1.049042 .0013945 Male | 1 0 Female | .648767 .0280172 Intercept | .0887874 .0063561

That worked—we have a table! But the table raises an important question. The dimension result has 43 levels, and the dimension colname included levels like c1. Why aren’t all of those levels displayed in the table?

The answer is that collect layout only includes cells where there is a value associated with each level of both the row and the column dimensions. Recall that we requested that only the coefficients (_r_b) and standard errors (_r_se) from our model be displayed. And those coefficients and standard errors were only collected for the levels age, 1.sex, 2.sex, and _cons for the dimension colname.

Once we understand this concept, we can explore other layouts for our table. For example, we could stack the coefficients and standard errors under each variable in our model.

. collect layout (colname#result) (), name(MyModels) Collection: MyModels Rows: colname#result Table 1: 12 x 1 ------------------------ Age (years) | Coefficient | 1.049042 Std. error | .0013945 Male | Coefficient | 1 Std. error | 0 Female | Coefficient | .648767 Std. error | .0280172 Intercept | Coefficient | .0887874 Std. error | .0063561 ------------------------

We will eventually create a similar column of results for each of these three models. Recall that we created the dimension model with collect, tag(). Let’s view the levels of the dimension model by typing collect levelsof model.

. collect levelsof model Collection: MyModels Dimension: model Levels: (1)

For now, the dimension model has one level named (1), and we can specify model as our column dimension.

. collect layout (colname#result) (model[(1)]), name(MyModels) Collection: MyModels Rows: colname#result Columns: model[(1)] Table 1: 12 x 1 ------------------------ | (1) --------------+--------- Age (years) | Coefficient | 1.049042 Std. error | .0013945 Male | Coefficient | 1 Std. error | 0 Female | Coefficient | .648767 Std. error | .0280172 Intercept | Coefficient | .0887874 Std. error | .0063561 ------------------------

This approach to building tables in steps can be helpful if you are unsure how to begin. Start by typing collect dims to view the dimensions in the collection. Then use collect levelsof to view the levels of each dimension. Then experiment with collect layout to design your table. The output of collect layout will often provide helpful instructions.

Collecting results from multiple commands

Recall that we would also like to include the AIC and BIC for each model in our table, and we can estimate them by typing estat ic after we fit the model.

. estat ic Akaike's information criterion and Bayesian information criterion ----------------------------------------------------------------------------- Model | N ll(null) ll(model) df AIC BIC -------------+--------------------------------------------------------------- . | 10,351 -7050.765 -6268.998 3 12544 12565.73 ----------------------------------------------------------------------------- Note: BIC uses N = number of observations. See [R] BIC note.

The estimates of the AIC and BIC are stored in a matrix named r(S).

. return list matrices: r(S) : 1 x 6

And we can view the matrix by typing matlist r(S).

. matlist r(S) | N ll0 ll df AIC BIC -------------+------------------------------------------------------------------ . | 10351 -7050.765 -6268.998 3 12544 12565.73

We can refer to the AIC and BIC in the matrix r(S) using matrix subscripting. The general syntax to refer to an element in a matrix is matname[row,column]. Using this syntax, we can refer to the BIC as r(S)[1,6]. Column 6 is named BIC, so we can also refer to the BIC as r(S)[1,”BIC”].

. display r(S)[1,"BIC"] 12565.73

Let’s collect the AIC and BIC and store them in level (1) of dimension model in our MyModels collection.

collect AIC=r(S)[1,"AIC"] /// BIC=r(S)[1,"BIC"], /// name(MyModels) /// tag(model[(1)]) /// : estat ic

Then we can include them in our table by adding result[AIC BIC] to the row dimension of collect layout.

. collect layout (colname#result result[AIC BIC]) (model[(1)]), name(MyModels) Collection: MyModels Rows: colname#result result[AIC BIC] Columns: model[(1)] Table 1: 14 x 1 ------------------------ | (1) --------------+--------- Age (years) | Coefficient | 1.049042 Std. error | .0013945 Male | Coefficient | 1 Std. error | 0 Female | Coefficient | .648767 Std. error | .0280172 Intercept | Coefficient | .0887874 Std. error | .0063561 AIC | 12544 BIC | 12565.73 ------------------------

Notice that I did not include the “#” operator when I added the row dimension result[AIC BIC]. This is because AIC and BIC are not nested within each level of the dimension colname. I simply wanted to add rows for AIC and BIC at the bottom of the table.

Adding more models to the table

Let’s add a second model to our table. Notice that the commands below are nearly identical to the commands I used above. There are only two differences. First, I have used factor-variable notation to add the interaction of age and sex to the logistic regression model. And second, I am storing the results to level (2) of the dimension model.

collect _r_b _r_se, /// name(MyModels) /// tag(model[(2)]) /// : logistic highbp c.age##i.sex collect AIC=r(S)[1,"AIC"] /// BIC=r(S)[1,"BIC"], /// name(MyModels) /// tag(model[(2)]) /// : estat ic

We can use collect layout to make sure that it worked.

. collect layout (colname#result result[AIC BIC]) (model), name(MyModels) Collection: MyModels Rows: colname#result result[AIC BIC] Columns: model Table 1: 20 x 2 ---------------------------------------- | (1) (2) ---------------------+------------------ Age (years) | Coefficient | 1.049042 1.035184 Std. error | .0013945 .0018459 Male | Coefficient | 1 1 Std. error | 0 0 Female | Coefficient | .648767 .1556985 Std. error | .0280172 .0224504 Male # Age (years) | Coefficient | 1 Std. error | 0 Female # Age (years) | Coefficient | 1.028811 Std. error | .002794 Intercept | Coefficient | .0887874 .1690035 Std. error | .0063561 .0153794 AIC | 12544 12434.34 BIC | 12565.73 12463.32 ----------------------------------------

That worked, so let’s add a third model to our table. Let’s add the variable diabetes to our second model. And we will store the results to level (3) of the dimension model.

collect _r_b _r_se, /// name(MyModels) /// tag(model[(3)]) /// : logistic highbp c.age##i.sex i.diabetes collect AIC=r(S)[1,"AIC"] /// BIC=r(S)[1,"BIC"], /// name(MyModels) /// tag(model[(3)]) /// : estat ic

Let’s use collect layout again to make sure that it worked.

. collect layout (colname#result result[AIC BIC]) (model), name(MyModels) Collection: MyModels Rows: colname#result result[AIC BIC] Columns: model Table 1: 26 x 3 ------------------------------------------------- | (1) (2) (3) ---------------------+--------------------------- Age (years) | Coefficient | 1.049042 1.035184 1.034281 Std. error | .0013945 .0018459 .0018566 Male | Coefficient | 1 1 1 Std. error | 0 0 0 Female | Coefficient | .648767 .1556985 .1549363 Std. error | .0280172 .0224504 .0223461 Male # Age (years) | Coefficient | 1 1 Std. error | 0 0 Female # Age (years) | Coefficient | 1.028811 1.028856 Std. error | .002794 .0027958 Not diabetic | Coefficient | 1 Std. error | 0 Diabetic | Coefficient | 1.521011 Std. error | .154103 Intercept | Coefficient | .0887874 .1690035 .1730928 Std. error | .0063561 .0153794 .0157789 AIC | 12544 12434.34 12417.74 BIC | 12565.73 12463.32 12453.97 -------------------------------------------------

Now we have the basic layout of our table. All we need to do now is customize the layout and export it to an Adobe PDF document.

Use collect style to format the table

I will use collect style showbase, collect style row, collect style cell, and collect style header to customize the layout of our table. The commands in the code block below are the same commands I used in previous posts, so I won’t explain each step here. But I have included comments to refresh our memory.

// TURN OFF BASE LEVELS FOR FACTOR VARIABLES collect style showbase off // CHANGE THE INTERACTION DELIMITER collect style row stack, spacer delimiter(" x ") // REMOVE THE VERTICAL LINE collect style cell border_block, border(right, pattern(nil)) // FORMAT THE NUMBERS collect style cell, nformat(%5.2f) collect style cell result[AIC BIC], nformat(%8.0f) // PUT PARENTHESES AROUND THE STANDARD ERRORS collect style cell result[_r_se], sformat("(%s)") // LABEL AIC AND BIC collect style header result[AIC BIC], level(label)

Let’s type collect preview to check our work so far.

. collect preview ----------------------------------------- (1) (2) (3) ----------------------------------------- Age (years) Coefficient 1.05 1.04 1.03 Std. error (0.00) (0.00) (0.00) Female Coefficient 0.65 0.16 0.15 Std. error (0.03) (0.02) (0.02) Female x Age (years) Coefficient 1.03 1.03 Std. error (0.00) (0.00) Diabetic Coefficient 1.52 Std. error (0.15) Intercept Coefficient 0.09 0.17 0.17 Std. error (0.01) (0.02) (0.02) AIC 12544 12434 12418 BIC 12566 12463 12454 -----------------------------------------

Next I will use some options that are unique to this table. First, I will use the collect style cell option halign() to center the items and column headers in the table.

. collect style cell cell_type[item column-header], halign(center) . collect preview ----------------------------------------- (1) (2) (3) ----------------------------------------- Age (years) Coefficient 1.05 1.04 1.03 Std. error (0.00) (0.00) (0.00) Female Coefficient 0.65 0.16 0.15 Std. error (0.03) (0.02) (0.02) Female x Age (years) Coefficient 1.03 1.03 Std. error (0.00) (0.00) Diabetic Coefficient 1.52 Std. error (0.15) Intercept Coefficient 0.09 0.17 0.17 Std. error (0.01) (0.02) (0.02) AIC 12544 12434 12418 BIC 12566 12463 12454 -----------------------------------------

Then I will use the collect style header option level() to hide the labels for the row dimension result.

. collect style header result, level(hide) . collect preview ----------------------------------------- (1) (2) (3) ----------------------------------------- Age (years) 1.05 1.04 1.03 (0.00) (0.00) (0.00) Female 0.65 0.16 0.15 (0.03) (0.02) (0.02) Female x Age (years) 1.03 1.03 (0.00) (0.00) Diabetic 1.52 (0.15) Intercept 0.09 0.17 0.17 (0.01) (0.02) (0.02) AIC 12544 12434 12418 BIC 12566 12463 12454 -----------------------------------------

And finally, I will use the collect style column option extraspace to add an extra space between the columns. I think this makes it easier to read the table.

. collect style column, extraspace(1) . collect preview ---------------------------------------------- (1) (2) (3) ---------------------------------------------- Age (years) 1.05 1.04 1.03 (0.00) (0.00) (0.00) Female 0.65 0.16 0.15 (0.03) (0.02) (0.02) Female x Age (years) 1.03 1.03 (0.00) (0.00) Diabetic 1.52 (0.15) Intercept 0.09 0.17 0.17 (0.01) (0.02) (0.02) AIC 12544 12434 12418 BIC 12566 12463 12454 ----------------------------------------------

We did it! We collected the results from our models and customized the layout.

Export the table to an Adobe PDF document

I showed you how to export your tables to a Microsoft Word document in my previous posts. Let’s try something new and export our table to an Adobe PDF document. Most of the putpdf commands are identical to their corresponding putdocx commands, with the obvious exception that they begin with putpdf rather than putdocx. But there are a few important differences.

First, I have replaced putdocx paragraph, style() with putpdf paragraph, font() halign(). The first instance sets the font to a 26-point Calibri Light font and centers the text horizonally on the page. The second instance sets the font to a 14-point Calibri Light font and begins the text on the left of the page. The third instance does not specify a font() or halign() option, so the default 11-point Helvetica font is used.

Second, I have replaced the collect style putdocx option layout(autofitcontents) with the collect style putpdf options width() and indent(). The width(60%) option sets the width of the table to 60% of the full width of the page. The indent(1 in) option indents the table one inch from the left side of the page.

And third, I have used the note() option with collect style putpdf to add a note to the table to tell the reader that the table displays odds ratios with standard errors in parentheses.

putpdf clear putpdf begin putpdf paragraph, font("Calibri Light",26) halign(center) putpdf text ("Hypertension in the United States") putpdf paragraph, font("Calibri Light",14) halign(left) putpdf text ("The National Health and Nutrition Examination Survey (NHANES)") putpdf paragraph putpdf text ("Hypertension is a major cause of morbidity and mortality in ") putpdf text ("the United States. This report will explore the predictors ") putpdf text ("of hypertension using the NHANES dataset.") collect style putpdf, width(60%) indent(1 in) /// title("Table 3: Logistic Regression Models for Hypertension Status") /// note("Note: Odds ratio (standard error)") putpdf collect putpdf save MyTable3.pdf, replace

The resulting Adobe PDF document looks like the image below.

graph1

In this post, we learned a new strategy to create tables using only the collect suite of commands. We used collect get to collect results from Stata commands, and we used collect layout to specify the layout of our table. We learned how to name our collections and store the results from commands to specific levels of dimensions.

You have probably noticed that we have used the same set of collect style commands in these blog posts. We could continue to copy and paste them into our future do-files, but there is an easier way to reuse collect style commands. In my next post, I will show you how to use collect style save and collect style use to save styles and reuse them with other tables. And I will show you how to use collect label save and collect label use to save labels for the levels of dimensions.