Converting STATA DTA file to CSV file

I have burgeoning interests in data science and applications toward organ transplantation. I have gotten a hold of transplant data from UNOS.  With so much data out there, we can use statistics to answer a lot of questions and understand patterns in transplantation.

Unfortunately, the format they send comes as a STATA file. STATA is an expensive piece of statistical software that I cannot afford. There are open source alternatives out there to help convert the data into formats we can all use.

Anaconda_Logo

One example is the software package Anaconda. It is a free, open-source, cross-platform distribution of the Python and R programming languages for scientific computing. I have turned to it often to convert and munge data into formats that are more accessible.

In this blog post,  I will outline a tutorial to convert STATA data format, *.DTA to a *.CVS file. CVS stands for comma separated value. It has several benefits over STATA such as:

  • Fairly flat, simple schema
  • The ability to be opened by text editors
  • Furthermore, most programming languages can parse CSV data
  • CSV is safe and can clearly differentiate between the numeric values and text

You do not need any programming experience to complete this task. If you would like to convert STAT formats into CSV, follow this tutorial :

Install Anaconda software package, which comes as Windows, MacOS and Linux installations. Anaconda will include the software you need such as Python, VS code editor, Pandas, and Jupyter.

Once installed, open Anaconda Navigator and you will be greeted with the following screen:

Anaconda Navigator Screenshot
Anaconda Navigator Screenshot

Click on the ‘Launch’ button in the Jupyter box. This load up a browser-based interface the following URL: http://localhost:8889/tree

Using the file navigator within the Jupyter browser-based interface, navigate to the directory where your *.DTA file(s) reside. You should be seeing the following screen:

Jupyter Screenshot
Jupyter Screenshot

Once within the folder, click on the ‘New’ button in the top right and selected ‘Python 3’ under the Notebook subheader. You should see the following screen:

Jupyter Notebook
Jupyter Notebook

This is an interface where you can insert line-by-line Python commands. You will need to insert the following as subsequent lines, changing the file name *.DTA to match your STATA file. After each line you must press the play button:

>>> import pandas as pd
>>> data = pd.io.stata.read_stata('LIVER_DATA.DTA')
>>> data.to_csv('LIVER_DATA.CSV')

Jupyter interface talks Python to use Pandas library to convert the DTA file into a CSV file, which should look like this:

Jupyter-Completed-Conversion
Jupyter Notebook Conversion Script

The completed conversion .CSV file now resides in my directory:

Completed Conversion
The Completed Conversion

That’s it! You can now use the CSV file in other applications, such as R, Tableau, Excel, and JMP Pro.

Please let me know if you have any questions or concerns either by commenting in this post or sending an email via the Contact form. In future posts, I will lay out tutorials on data munging, statistical analysis and graphical visualization. Stay tuned!

One thought on “Converting STATA DTA file to CSV file

Comments are closed.