As you probably heard, we are launching the Omics Logic Data Science Program again next week. In this program, participants will learn about various applications of data science principles to biomedical data challenges. These are collectively known as “Data Science Skills”. Skills that are in high demand and require quite a bit of practice as well as an in-depth understanding of methods, data types, and questions one might want to get answers to. In this blog, we will talk specifically about how to get started as you register for the program. One important step is to install Python 3 and the IDE we will use in the sessions.
- Python 3 – this will be covered in this document
- R studio – you can read more about R studio and how to get set up here:
Once you have those, you will also need packages and libraries that will be used in the exercises we will do over the course of this program. Below, you will find the relevant explanation about these and direction on how to install Python 3:
Installing Python 3
- You can download python by clicking the link. we want to download a version of Python 3, open on the downloaded file and follow through with the instructions presented in the installation window.
- Verify Python is installed correctly by launching the command prompt on Windows or terminal on Mac and type “python” and hit the enter key, followed by an instruction e.g. 2+2, if everything is installed correctly you will get the answer.
Installing an IDE (VSCode)
Several solutions exist to run the python code and debug it, these are called Integrated Development Environments or IDEs. These solutions typically make it easier for a user to write code, access functions, see documents and issues in the code like syntax mistakes, which make up about 90% of the problems you will face.
Instead of writing all our code in the terminal, using an IDE to write our code allows the use of plugins and extra features like code highlighting and autocomplete. One of such environments is VSCode:
- Download VSCode, install, and launch it.
- On the side menu to the left press the extension’s icon.
- Search for Python and install the extension. This will let you use Python in VSCode.
Now you are ready to run python code!
Writing python code
We are going to write a function that counts the bases of a short sequence and prints it out. Remember the indentations are important in Python code and anything following a # or “”” is just a comment to explain what the lines are doing and do not need to be written.
- In a folder create a new file and save it as a *.py file e.g. sequence_count.py and open it in VSCode.
- Follow and copy the code below (all the lines have comments explaining what they do).
seq = "AGTGTCCCTG" #store DNA strand as a string in the variable seqprint(len(seq)) #prints the sequence length
- Press the play button at the top right corner of the window and this will run the code in the built-in terminal. You could also open a terminal in the same folder and run the python command followed by your file path e.g. “python sequence_count.py”
Save and run code from Omics Logic Code
- Click the save button under the code block you want to save and place any associated files needed in the code in the same folder.
- Open the folder where the files are saved in VSCode, by clicking “File” then “Open Folder…”
- You can try to run the code but nothing will print out in the console as the variables you want to print have to be passed into the “print()” function then run the code again.
#import pandas import pandas as pd import numpy as np df = pd.read_table('Biology_Height_dataset.txt',sep='\t',header=(0)) print(df)
*Note to run this code you will need the Pandas and NumPy packages installed, you can follow the instructions below to learn how to install packages.
Python packages are installed using the PIP package manager.
- Open a terminal/console window
- Type “pip install” followed by the package name you want to install e.g. NumPy
pip install numpy
Hit the enter key and wait until it completes installing. You can now use the packages, e.g. import NumPy as np in your code, but all package usages are different so the official documentation for the package is a good reference for getting started.
Now you are ready to test out python code in the Omics Logic Data Science Program. Here is what we will cover in the online sessions during this training program: