guides.nyu.edu Open in urlscan Pro
34.194.39.199  Public Scan

URL: https://guides.nyu.edu/datascience/literate-prog
Submission Tags: demotag1 demotag2 Search All
Submission: On October 29 via api from IE — Scanned from DE

Form analysis 1 forms found in the DOM

GET https://guides.nyu.edu/srch.php

<form role="search" id="s-lg-guide-search-form" action="https://guides.nyu.edu/srch.php" method="GET">
  <div class="input-group input-group-sm">
    <input type="text" id="s-lg-guide-search-terms" name="q" class="form-control" maxlength="260" placeholder="Search this Guide">
    <label class="sr-only" for="s-lg-guide-search-terms">Search this Guide</label><input type="hidden" name="guide_id" value="937570"><span class="input-group-btn"><button class="btn btn-default" type="submit">Search</button></span>
  </div>
</form>

Text Content

Skip to Main Content

 1. NYU Libraries
 2. Research Guides
 3. Data Science
 4. Literate Programming

Search this GuideSearch


DATA SCIENCE

A guide with resources for the data science community on campus.

 * WELCOME
   * Starting resources
 * FINDING DATA
   * Citing Data
 * COMPUTE RESOURCES
 * STORAGE AND BACKUP
 * PROGRAMMING
   * Finding Code
   * Citing Code
   * Literate Programming
   * Version Control with Git
   * Code Publishing ↗
 * VISUALIZATION ↗
 * SHARING DATA & CODE
   * Publish articles open access ↗ This link opens in a new window
 * WORKSHOP CALENDAR
   * The Carpentries @ NYU ↗
 * RESOURCES FOR INSTRUCTORS
 * MS ORIENTATION


ASK YOUR LIBRARIAN!

Hello! I am Vicky Rampin, the Librarian for Research Data Management and
Reproducibility. I am also the liaison to computer science and data science
programs at NYU! I am here to help you navigate the resources for both at NYU
and beyond. You can set up an appointment with me or always email me at:
vs77@nyu.edu.

Meet with Vicky

If you need help with a specific quantitative, GIS, or qualitative software, you
should reach out to Data Services.


RELATED SLIDE DECKS

 * Intro to Jupyter Notebooks This class is designed for first-time and
   longer-term users of Jupyter Notebooks, a workspace for writing code. The
   class focuses on using Notebooks to facilitate sharing and publishing of
   script workflows. It aims to provide users with knowledge about shortcuts,
   plugins, and best practices for maximizing re-usability and shareability of
   Notebook contents.


CC


Original work in this LibGuide is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International License.


LITERATE PROGRAMMING

Donald Knuth first defined literate programming as a script, notebook, or
computational document that contains an explanation of the program logic in a
natural language (e.g. English or Mandarin), interspersed with snippets of
macros and source code, which can be compiled and rerun. You can think of it as
an executable paper!

No matter which literate programming tool you use, only run the cells from top
to bottom – ONLY. The number 1 cause of irreproducible jupyter notebooks is that
the original authors run the cells out of order, which can't be reproduced
without documentation about which cells in which order. So run your literate
programming notebooks from top to bottom only.


JUPYTER NOTEBOOKS

Jupyter Notebook is an interactive computing environment that enables users to
author notebook documents that include code, interactive widgets, plots,
narrative text, equations, images and even videos! Jupyter notebooks are heavily
used in data science, and it would behoove you to get comfortable with the tool.
The jupyter name comes from 3 programming languages: Julia, Python, and R. You
can use one programming language per document, and it is done through choosing a
kernel (e.g. Python, R, Go, and more -- get the full list of kernels from the
wiki).

Jupyter notebooks can be comprised mainly of two types of cells (though more can
be added with plugins).

 1. Markdown Cells (for narratives): when run, a markdown cell will display
    markdown or HTML that you write (that means all sort of rich content,
    including images). Essential markdown summary:
    https://daringfireball.net/projects/markdown/syntax
 2. Code Cells (for data cleaning, analysis, visualization, etc.): executable
    code in a variety of languages, dictated by the kernel (default is Python,
    but more can be added).

Some key jupyter notebook shortcuts to keep in mind while you work:

 * Use shift + enter to run an active cell

 * Use esc in highlighted cell to toggle command options:
   
   * esc + L - show line numbers
   
   * esc + M - format cell as Markdown cell
   
   * esc + a - insert cell above current cell
   
   * esc + b - insert cell below current cell

 * Check all current variables: run %whos from a code cell


RMARKDOWN

RMarkdown is another popular literate programming tool and can be considered an
extension of Markdown. Like all literate programming tools, it  mixes
documentation & code, and not just R code either! You can insert code snippets
from other languages (SQL, bash, Python, etc.) into ONE DOCUMENT! Incorporating
results directly into your documents is an important step in reproducible
research. Any changes that occur in either your data set or the analysis are
automatically updated in your document the next time the document is created.

Typically, RMarkdown files are edited from within RStudio. The R for Data
Science book contains a great chapter on RMarkdown for more information:
https://r4ds.had.co.nz/r-markdown.html.

 * << Previous: Citing Code
 * Next: Version Control with Git >>

 * Last Updated: Oct 18, 2024 11:37 AM
 * URL: https://guides.nyu.edu/datascience
 * Print Page

Author Log-in
Report a problem
Subjects: Data Science
Tags: code license, coding, data management, data license, data science, git,
instruction, jupyter, programming, reproducibility, research, teaching


Accessibility (opens in new window)

Close