## **Module 1 - Basic Skills for Astronomy Research**
________________________________________________________________________

In this tutorial, you will learn be given a quick introduction to the terminal and command line prompts on computers, as well as learn about the Python coding language.  In particular, we will cover some of the more common mistakes and misconceptions that first-time coders experience. 


The goal of this module is to walk you through the basics, step-by-step, leaving no room for misconception. 

Take your time. **Read instructions carefully.** And always remember:

![picture](https://drive.google.com/uc?id=1C00ohA69MdPchLNR382qcykeOdJaS7pm)

Skills to be covered include how to find and navigate a terminal, basic Unix commands, resources to learn more about coding (i.e., Code Academy, Data Camp), how to set up Jupyter Notebooks and import packages, and the definition of variables and data structures.

---
**BEFORE YOU BEGIN:** 

1.   Navigate to "File" --> "Save a Copy in Drive"
2.   Rename file appending your name (e.g., "M1_basicskills_Whitaker.ipynb")

***Important Note:*** It is *not* enough to just rename the filename at the top, please follow the instructions above.

---
**INTRODUCTION: Using the terminal/command prompt**
---
________________________________________________________________________


While these tutorials are designed to be completed within the Google Colaboratory Notebooks themselves, avoiding a local installation of Python, we will first cover some basic programming on your computer's command line to help orient you.

*NOTE: This introductory section contains no Python code, only command line.  This is why the answer prompts have your responses commented out with '#', as otherwise you will see an error.*

**Finding the Terminal**


In order to run code (of any language), we must first use a computer "terminal". The terminal is something local to every computer; ***we will ask you to go outside of Google Colab just for this first part*** to get to know the power of your own computer in more detail.  

If you are using a Mac, open the application **"Terminal"**.  The easiest way to find this is to search for this word in your Spotlight, i.e., the magnifying glass in the upper right hand corner. In general, a Linux/Unix computer will be similar to a Mac. 

![picture](https://drive.google.com/uc?id=1-hywhHg3KsjKspwuJWdaRap5GdRr4C6g)

If you are using Windows/PC, open instead either the program **"Command Prompt"** or **"Windows Powershell"**. 



All of these command line terminals operate in a similar fashion, with some minor differences noted below.

**Your Home Address**

Once a terminal is open, you should be brought to your "home" directory. In order to orient yourself and learn your own personal home address, as well as to see what files or folders are contained in this directory, you can learn the 'where am I?' command. 


For either a Mac or a PC/Windows, type the following:

___________________________________________________________________________
*You@YourComputer:/>* **pwd**
___________________________________________________________________________


This should tell you the current directory you are in. 

Now type the following for Mac users:
___________________________________________________________________________
*You@YourComputer:/>* **ls**
___________________________________________________________________________


Or for Windows users:

___________________________________________________________________________
You@YourComputer:/> **dir**
___________________________________________________________________________


This should return a list of all items inside of this directory. The *dir* command is functionally equivalent to the command *ls -C -b*, in which the output lists in columns, sorted vertically, and with backslashed escape sequences to represent special characters.  

Many commands have these extra options, which can be very helpful for sorting the directory contents.  For example, try out listing the full file information and sorting in reverse (-r) chronological order: 

___________________________________________________________________________
*You@YourComputer:/>* **ls -ltr**
___________________________________________________________________________

**Making a New Folder and Navigation** 


Next, we will create a new folder and then navigate into it. The following code should be equivalent in Terminal and Command Prompt:

___________________________________________________________________________
You@YourComputer:/> **mkdir new**
___________________________________________________________________________

*   *This line creates a new directory with "mkdir" (make directory) and names it "new"*


___________________________________________________________________________
You@YourComputer:/> **cd new**

You@YourComputer:/new>
___________________________________________________________________________



*   We then use "cd" to change directories (cd, get it?)
*   The last line shows us that we are now in the "new" directory with the abbreviated location listed on the left in the terminal prompt. 

While you created a new directory (folder), there is not actually anything inside it.  Therefore, if you change directories into your new folder and type the "ls" command, nothing will be returned.  Do not worry!  This doesn't mean it didn't happen.  Notice that you now have this directory name listed in your terminal preamble though (e.g., "You@YourComputer:/new"). You can change directories back up one level and try "ls" again to see your new directory listed:

___________________________________________________________________________
You@YourComputer:/new> **cd ..**

You@YourComputer:/> **ls**
___________________________________________________________________________



**Take a moment to check in with your group.**  Share tips with each other about how to navigate the terminal.  Help each other out!

---


**MISSION #1:**
 

Now you have your first mission.  We want you to instead try to navigate to your 'Downloads' folder. This standard folder is usually located in your 'home' directory.   

On both Mac and Windows, **type the following to change directories (i.e., cd) and return to the home folder**:

___________________________________________________________________________
You@YourComputer:/> cd ~
___________________________________________________________________________

Next, see if you can figure out how to navigate into the downloads folder using the basic commands that we have covered so far to orient yourself and move around. *Hint: Sometimes it helps to first find the entire path to your Downloads folder.  An analogy here would be if I wanted to find the address of a house, the street name is meaningless if I don't know the town/state.*

Once you are in your Downloads folder, we want you to ***find the date of your most recently downloaded file***.   

*Hint: Try out the reverse chronological sorting!*

Type in both answers below (after the lines starting with ###), both the directory path that you are in (e.g., the results when you type "pwd") and the date of your most recently downloaded file.  

In [None]:
# This is a code box in Google Colab. 
# <- and this is a comment (no code is executed right here); you can tell by the hashtag that comes first.
# Comments are a really great tool for annotating your code.
#
# Here we just want you to submit your answer (which is not code), so we will use comments
# If you removed the # below and pressed "run" button to the left, you'll get an common "invalid syntax" error. 

### Full Directory Path for Download Folder:
### Date of Last Downloaded File: 

print('this is actually code that will do something')
print('but seriously... write your answers above in the green commented out section.')
print('confused? ask for help!')

**How to "run" code:**  If you hover over the grey code box the [ ] symbol turns into a right arrow.  Press this play button and that will execute the code within the box.  Any error messages then pop-up below, else the result of the code appears below.

**Take a moment to check in with your group.**  If you have previous experience navigating the terminal, share your knowledge.  Don't be afraid to ask questions.  "*Sometimes asking for help is the bravest move you can make. You don't have to go it alone.*"

---

**Terminal Navigation from within Google Colab** 


Learning how to navigate the terminal and basic shell commands is one of the best skills you can add to your astrophysics tool kit.  It turns out you can also access a terminal from within Google Colab (in fact, there are a couple of ways to do this, but we will go with the most straight forward one).  Below, access this terminal and test out a few of the commands above.  Are you on your local computer (and hence can see local files) or are you on google drive's directory structure?  

In [2]:
# Access the terminal from within Google Colab and explore a bit
!bash

---
# **USING PYTHON IN GOOGLE COLABORATORY**
________________________________________________________________________

Python is a widely applicable, high-level, object-oriented programming language. In a nutshell, this means that Python is designed with user-accessibility in mind. "High-level" refers to the high separation between Python and the code that communicates closely or directly with your computer's hardware. Lower level programming languages, like C, are often much more complicated to write, but in turn often execute much faster due to the near-direct communication with the "machine code" that controls your computer.

While Python can be run in your command line if properly installed, these tutorials will utilize Python "shells" created within this Google Colaboratory document. (These are the grey boxes below.)

In order to execute Python code in the Google Colaboratory Notebook, simply press the "Run" button in the top left corner. If you'd like to reset all of the cells, go to "Runtime>Restart Runtime."

*Google Colaboratory uses iPython Notebook files (.ipynb), but these files can also be saved as regular Python (.py) files and run in a terminal if you follow the instructions appended to this module to install Python locally.* 

---
**UPLOADING FILES IN GOOGLE COLABORATORY**
________________________________________________________________________
Before we get started, we first have to read our file into our notebook. In order to download the data file, right click (hold control-option and click on mac) the following highlighted blue link, "Data File", and select "Save link as..." to save it as a text document.  Go on, give it a try, the blue word right here: [Data File](https://www.astrowhit.com/s/3dhst_whitaker14_fall2020.txt)



**Step 1:**  Just in case you are confused about right clicking, we showed you exactly what you should see below.

![picture](https://drive.google.com/uc?id=1ul6TBNnJEN6hv5bsStIPSQLia_0P9trm)






**Step 2:**  Save the file, but pay attention where you save it!

![picture](https://drive.google.com/uc?id=1e0zmVtKaX9S2FI7iH5tQc91gIRLDaLK6)

**Step 3:** Now that you downloaded and saved the file, we have to load the file you just downloaded back into Google Colab (silly, really, I know). To upload a file, click on the folder icon on the left side of the screen. 

Instructions of where to look -- glance up and to your left and look for the following upload symbol.  If you were like me, you have the top left symbol that looks like bullet points highlighted.  Click on the folder 3 down instead.  Then you should see this same upload symbol. 

Then, click the "Upload to session storage" icon. Upload the file you just downloaded called "3dhst_whitaker14_fall2020.txt".

![picture](https://drive.google.com/uc?id=1JRPX7r30HVILyM4Ne2RAjG-s_sCwPTR7)

You should now see the '3dhst_whitaker14_fall2020.txt file listed and ready to go.  Good job!

Note that if you close your Google Colab module and return to it later, check if this file is still listed.  You may need to re-upload it. 

In [None]:
# You can also check if your file is uploaded using the local terminal within Google Colab
# Go on, test out those terminal commands you learned!
!bash

**Take a moment to check in with your group.**  Make sure everyone has their file uploaded.

---

---
**IMPORTING PACKAGES**
________________________________________________________________________

In order to simplify, organize, and vizualize certain aspects of our code - we can import some Python packages, i.e., load them up. Packages are incredibly useful as they often contain hundreds to thousands of built-in functions that we simply need to import into our file to use. Typically, all packages are called at the beginning of a new Python script. In general, these packages must be downloaded to your machine before you will be able to use them. This problem is avoided when using an online Python notebook (such as this one, Google Colab, or Jupyter). Let us now import a math package (*numpy*), a Python astronomy library (*astropy*), and a plotting package (*matplotlib*):

In [None]:
import numpy as np
import matplotlib.pyplot as plt 
from astropy.io import ascii 
%matplotlib inline

We've imported *numpy* "as np" which means we only have to type "np" instead of *numpy* when we want to call a function from this package. We've also imported a package called "ascii" from the library *astropy*. If you haven't already, make sure to click the "run cell" button next to [1] to import the packages.  *Hint: this looks like a little 'play' button and spins when you click on it and execute the code*

---
**DATA TYPES**
________________________________________________________________________

Python has a variety of built-in data types that each have unique attributes/functions. For the purposes of this tutorial, we will only mention a few of the most commonly used data types, but we recommend the following resource if you are curious to learn more: [Python Data Types](https://https://www.w3schools.com/python/python_datatypes.asp). 

For example, text is stored as a "string" (*str*). Strings are denoted by quotation marks around a word or phrase within code. If we have a variable "x" which is not a string, we can also convert it to one with the following: str(x). You can also glue two strings together: 'happy'+'birthday' = 'happybirthday'.  Notice there is no space in between as neither string had a space in it.  

**MISSION #2:**
 

Your second mission is here.  Go ahead and try out working with data types for yourself!

**Convert the variable "x" into a string and print it out:**

In [None]:
x = 22    # x is variable and happens to be today's date.

# Enter your code here to turn this variable into the string 'Sept' glued to x (don't forget the space!)
# If it throws an errors, don't worry, try again!  Remember:
# (1) how you glue strings, and also (2) that x needs to be converted to a string first.


print(x)  # The print function is super handy and returns something on the command line when executed.

Integers (*int*) are countable, whole number values. We commonly use integers to index a list or array (it doesn't make sense to slice an array at the 0.10th index). A good analogy for discrete numbers like integers would be the floors an elevator can visit.  You can visit the first, second, third floor, etc, but you can't go to the 9.75 floor.  

If we define a variable as some whole number, without any decimal, it will automatically be defined as an *int*. A float or floating-point variable (*float*) is a data type that contains a decimal point. Try defining an integer, a string, and a float using two different methods each below:

In [None]:
# integer

xi1 = 10
xi2 = int(10)

# string 

xs1 = 'Goodnight'
xs2 = str("Hello World")

# float

xf1 = 10.12
xf2 = float(10)

# use "type(variable name)" to check and see if you did it correctly
# e.g., update below to check the type of xi1 by filling in:  print(type(xi1))

print( )    # fill this in
print( )    # fill this in
print( )    # fill this in


The last data type we are going to go over is called a boolean or bool (*bool*) and they can represent only 2 values: either True or False. We can define booleans directly, as we did above with other data types, but we can also return booleans when we want to check "truth values." For example, if we define a string, we can check to see if the string contains any digits:


In [None]:
string = "Hello World"

# we can use the built-in function isdigit()

print(string.isdigit())

---
**VARIABLES AND DATA STRUCTURES**
________________________________________________________________________
When using Python, we can define "variables" as certain data structures or types. The cell below shows how we can define a variable as a single integer, float, or string as well as an array. Arrays are simply sets of values and can have several dimensions.


In [None]:
# We will first define a variable called "x" below and set it to 2

x = 2

# Now we will define an array called "y" and set it equal to a set of numbers

y = [0,1,2,3,4,5,6]  # Brackets are used to denote the array and commas are used to separate values

# Watch what happens when we define a new array "z" with the following:

z = y*int(x)
print(z)

By defining z as above, we made a new array containing 2 copies of y. Oops! What we really wanted to do was multiply every value by 2. This is where the *numpy* package comes in really handy, as it seemlessly allows you to do math with arrays.

In [None]:
# Let's define the same arrays
x = 2
y = [0,1,2,3,4,5,6]
z = np.array(y)*x   # here we are converting y into an numpy array so that we can perform some math
#print(z)

# Try rewriting how we define the variable z in a different way using np.array() 

z =                 # is there another way you can get the same result?
print(z)

As you will notice, there are often many different ways to approach a problem.  There is one last way we can introduce to manipulate arrays, which will come in handy with large data sets.  Now let's try this by using a ***for loop***:

In [None]:
z = y
for i in range(7):       # This is the loop part: define i to loop through 7 consecutive numbers
    z[i] = z[i]*int(x)   # Notice how this line is indented by 4 spaces, this indicates whats the action inside the loop
print(z) # We get rid of our indentation when we wish to exit the for loop.  
         # What would happen if you didn't remove the indentation? (try it out!)

The "i" in z[i] is called the index of the array and allow us to access the value stored in that position of the array. In this for loop, we cycle through every index in the array and multiply the value by 2. For loops may seem a bit complicated, but it just means in words "for every value in some range given, execute some task". Here, we said for every index in range seven (i = 0, 1, 2, 3, 4, 5, and 6), multiply by 2.



In [None]:
arr = [[1,2,3,4],[5,6,7,8]] # This is a two dimensional array

print(arr[0][1]) # This print function is currently set to print the value in the first array [0], second position [1] (indexes start at 0)

What if we have two separate arrays and we want to join them? We can use the *numpy* package *concatenate* do achieve this:

In [None]:
a = [1,2,3,4,5,6]
b = [7,8,9,10,11,12]

c = np.concatenate([a,b])

print(c)

If I want to know how long an array is (i.e., how many elements), I can use the following function:

In [None]:
print(len(c))

**Take a moment to check in with your group.**  Discuss data types, variables, and structures together.  Try putting into words the code you have just run.  Brainstorm where/when/how this might be useful in astrophysics.

---
**READING IN AND VIEWING DATA**
________________________________________________________________________

Now we are going to read in a file containing several astrophysical measurements. In general, you will first want to identify which type of file you are reading in (this will typically be something
along the lines of an ascii, txt, fits, or .csv files). The file we are reading in here is an ascii file (with a .txt extension) so we will use the "*ascii*" package with the function "*read*."

**MISSION #3:** Run the cell below to:

1.   define the path to the file from (within) Google Drive
2.   read in the file
3.   print the file



In [None]:
from astropy.table import Table
file='3dhst_whitaker14_fall2020.txt'
f=ascii.read( )  # Fill this in (i.e., insert the variable with the defined filename here)
print( )         # Print out the data structure here. Hint: what did you define the structure to be called?

In [None]:
# The type() method returns class type of the argument(object)
# Use this function to determine the class type of the data structure 'f'

type( )          # Insert the name of the data structure here

Some data files may not read in properly and require further conditions to read successfully. Troubleshooting is something you will get all too familiar with in coding.  Beyond the support database provided within 'astropy', we recommend using 'stack overflow' and google as a resource to debug things.  With time you will also get more familiar with reading code and understanding error messages.  When reading in data, the most common problems are: (1) the file path is wrong to locate your file, (2) you need to specify a delimiter and/or data type.  



---



**MISSION #4:** Now it is time to do something with the data structure we named "f". With our knowledge of printing table data and slicing arrays, try printing out the "log(SFR)" column below.  Look at the keywords describing the columns in your print statement above.  An example below shows how you would print the "maximum redshift" column, or 'zmax'.  

*Side note: log(SFR) is the logarithm base 10 of the star formation rate (acronym of SFR), which describes the number of new stars formed per year in a galaxy.*

In [None]:
# Enter your code here

f['zmax']

# HINT: array indexes can be integers OR sometimes strings (ie: 'y[41]' or 'img['data']' ) 

You have completed your missions for today -- way to go!  We all work at different paces; some of you will be more careful about reading all directions, others might whiz through this.  Now might be a good time to go back and review the concepts covered.


**Take a moment to check in with your group.**  Discuss how you approached writing your own code above.  Explain your reasoning!

---
**Final Instructions:**  ***DOWNLOAD YOUR SAVED FILE WITH YOUR LAST NAME APPENDED*** and submit this file in Moodle 

Module 1 is due no later than September 29, 2020

---
# **If time allows...**

---



For those of you who finished with time to spare and feel like you've mastered the basic concepts covered thus far, please continue on to this section to learn how to plot the data you just read in. 

**Plotting Data**
________________________________________________________________________
This section will act as a precursor tutorial for Module 2: Reading Catalogs and Plotting.

We will now use the handy package "*matplotlib*" and the function "*pyplot*" to plot data that we've read in. 

Our data set contains lots of information about the galaxies used to create it: stellar mass, redshift, star formation rate, luminosity etc. This data comes from the publication [K. Whitaker et al., 2014](https://ui.adsabs.harvard.edu/abs/2014ApJ...795..104W/abstract).  In short, these measurements were made from a beautiful compilation of data from the Hubble Space Telescope and Spitzer Space Telescope, as well as a wide range of ground-based telescopes covering the electromagnetic spectrum, in the common extragalactic legacy fields: COSMOS, AEGIS, GOODS-N, GOODS-S, and UDS.  The data in the table you downloaded is directly from this paper, and is a compilation of thousands of galaxies spanning billions of lookback times.  More information will come about the data next week, but for now let's just see what it looks like.

When you are exploring new data sets, it can be helpful to plot various columns and look for patterns and trends within the data. For the purposes of this tutorial, let’s look at star formation
rate as a function of stellar mass. What do you think this relation tells us physically? 

We will first define what we want to call star formation rate and stellar mass. In the catalogs, star formation rate is listed as “lsfr” and stellar mass is listed as “lmass”. Note that both the stellar mass and star formation rate are logarithmic base-10 measurements. In order to avoid confusion, we will define these variables as “lmass” and “lsfr” like so:

In [None]:
lmass = f['lmass'] # log stellar mass
lsfr = f['lSFR'] # log star formation rate

# Now we make a scatter plot from this data

plt.scatter(lmass,lsfr)

# Now label the axes
# You can experiment with this below to explore why $$ are important, super and subscripts (_ and ^), and latex symbols like \odot

plt.xlabel("log(M$_*$)[M$_\odot$]")
plt.ylabel("log(SFR UV + IR) [M$_\odot$ yr$^{-1}$]")

plt.show()

This graph looks good but let’s add some more detail, like a grid and title.
The commands for title and grid lines are the following:

In [None]:
plt.scatter(lmass,lsfr)

plt.xlabel("log(M$_*$)[M$_\odot$]")
plt.ylabel("log(SFR UV + IR) [M$_\odot$ yr$^{-1}$]")

plt.grid()    # This command adds the grid

plt.title("log(SFR) vs. log(Stellar Mass)")   # This command adds a title

plt.show()

Now we will create a second plot that displays three separate variables simultaneously: stellar
mass, star formation rate, and redshift.
To do this, you will have to use the "ax" command to define a separate subplot.

In [None]:
zmin=f['zmin']
zmax=f['zmax']
redshift = (zmin+zmax)/2     # Calculate the average redshift for each row from the min/max values
                             # Note that this are numpy arrays so we can perform math on them!

ax = plt.subplot(111)
ax.scatter(lmass,lsfr,s=20,c=redshift,marker='o')  # We are now color-coding by a third variable "redshift"
                                                   # s is optional and specifies the marker size (marker here is a circle)

ax.set_xlabel("log(M$_*$)[M$_\odot$]")
ax.set_ylabel("log(SFR UV + IR) [M$_\odot$ yr$^{-1}$]")

ax.set_aspect('equal','box') # This makes the plot have equal axis lengths and the shape of a box
                             # You'll notice that many professional publications have square plots
                             # But oddly this is never the default in Python.
plt.grid()
plt.rcParams['figure.figsize'] = [10, 10]   # Width, height in inches of the figure

plt.show()

We are still creating a scatter plot of stellar mass vs SFR but we have defined a third variable redshift (c=redshift) which adds the color layer. Normally sub-plot implies separate panels, but in this context it is just overplotting data on the same original panels.  A handy trick!

In Module 2, we will explore this in more detail and expand further to fit lines to the data. 

---
# **Resources**
________________________________________________________________________
**DOWNLOADING/INSTALLING PYTHON**

While these modules use Google Colaboratory as a platform to write and run Python code, it may be that you want to setup Python locally on your own computer.  Because nothing in life is ever easy, sometimes installing Python results in unique roadblocks for each user.  To not get lost in the weeds, we have opted to use Google Colaboratory, but we encourage you to also install a local version.  While we will support troubleshooting within Google Collaboratory for these activities, we recommend you seek online resources to troubleshoot personal installations of Python. 

The following instructions can be used to install Python. The newest stable release of Python is Python 3.8.5 and can be installed in a few different ways. The simplest way is to go to [Install Python](https://www.python.org/) and select the version compatible with your operating system.


For Mac users, accessing Python from the terminal after installation is simple. Open the terminal and type:

___________________________________________________________________________
You@YourComputer:/~ **sudo nano /etc/paths**
___________________________________________________________________________


Then add the path where Python is installed to this list and save. Python should now be accessible from the terminal.

For Windows machines, open System Properties from the Control Panel. Then go to the 'Advanced' tab and open Environment Variables. 

![picture](https://drive.google.com/uc?id=1lHQa_BOfg5mk853fYKjTEGW4ISA8vrZA)
![picture](https://drive.google.com/uc?id=1y88s0ssqr-T9_yhc599ysGSdQaXw5Fxk)

**OTHER CODING RESOURCES**
________________________________________________________________________
Programming can be an incredibly daunting and confusing skill for beginners, but have no fear! There are many resources (like this) that are intended to slowly introduce new concepts and make learning how to code more accessible. 

In these tutorials, we will mainly be using Python (as well as some basic command line arguments). 

Here are a few free resources to help get you started:



*   [Codecademy](https://www.codecademy.com/)
*   [Data Camp](https://www.datacamp.com/)
*   [Astronomy Research Tutorials - Astrowhit](https://www.astrowhit.com/astronomy-research-tutorial-repository)



---


*This tutorial was created by K. Whitaker and T. Metivier, with contributions from L. Wright and A. Pope.  Use of this tutorial is allowed, but please retain proper credit.  Questions can be directed to <kwhitaker@astro.umass.edu>*



