I have burgeoning interests in data science and applications toward organ transplantation. I have gotten a hold of transplant data from UNOS. With so much data out there, we can use statistics to answer a lot of questions and understand patterns in transplantation.
Unfortunately, the format they send comes as a STATA file. STATA is an expensive piece of statistical software that I cannot afford. There are open source alternatives out there to help convert the data into formats we can all use.
One example is the software package Anaconda. It is a free, open-source, cross-platform distribution of the Python and R programming languages for scientific computing. I have turned to it often to convert and munge data into formats that are more accessible.
In this blog post, I will outline a tutorial to convert STATA data format, *.DTA to a *.CVS file. CVS stands for comma separated value. It has several benefits over STATA such as:
- Fairly flat, simple schema
- The ability to be opened by text editors
- Furthermore, most programming languages can parse CSV data
- CSV is safe and can clearly differentiate between the numeric values and text
You do not need any programming experience to complete this task. If you would like to convert STAT formats into CSV, follow this tutorial :
Once installed, open Anaconda Navigator and you will be greeted with the following screen:
Click on the ‘Launch’ button in the Jupyter box. This load up a browser-based interface the following URL:
Using the file navigator within the Jupyter browser-based interface, navigate to the directory where your *.DTA file(s) reside. You should be seeing the following screen:
Once within the folder, click on the ‘New’ button in the top right and selected ‘Python 3’ under the Notebook subheader. You should see the following screen:
This is an interface where you can insert line-by-line Python commands. You will need to insert the following as subsequent lines, changing the file name *.DTA to match your STATA file. After each line you must press the play button:
>>> import pandas as pd >>> data = pd.io.stata.read_stata('LIVER_DATA.DTA') >>> data.to_csv('LIVER_DATA.CSV')
Jupyter interface talks Python to use Pandas library to convert the DTA file into a CSV file, which should look like this:
The completed conversion .CSV file now resides in my directory:
Please let me know if you have any questions or concerns either by commenting in this post or sending an email via the Contact form. In future posts, I will lay out tutorials on data munging, statistical analysis and graphical visualization. Stay tuned!