Welcome to DS10!

The course is officially listed as CS10 and STAT10, but we call it DS10 or Data Science 10. This course will meet the Harvard College requirement for Quantitative Reasoning with Data (QRD).

Instructors

  • Hanspeter Pfister, An Wang Professor of Computer Science, SEAS Harvard
  • Liberty Vittert, Professor of the Practice of Data Science at the Olin Business School, Washington University, St. Louis

Staff

  • Salma Abdel Magid, Head TF
  • Brian Chu
  • David Assaraf

Live Class & Programming Labs

Tu / Th 1:30-2:45 pm ET

(Schedule)[http://datascience10.org/2021/schedule.html]

What is this class about?

This course is about combining data, statistical methods, and computation to gain insights and make useful inferences and predictions. This course will take a holistic approach to help you understand the key elements of data science, from data collection and exploratory data analysis to modeling, evaluation, communication of results, and telling a data story. You will be discussing case studies, developing Python skills for data science, and working on team projects to provide you with hands-on experience with the data science process. Throughout the course, we will emphasize critical analytical thinking skills, data ethics, and data acumen.

By the end of the course, you will be able to use data and reproducible data science methods to answer questions and guide decision-making. If you want to understand how modern data science works, this is the course for you!

Learning Outcomes

After completion of this course you will be able to:

  • Articulate core objectives and quantitative goals to be addressed using data
  • Apply a structured design process to data science projects
  • Use visualizations to effectively explore data
  • Apply powerful data science tools from machine learning and statistics
  • Identify and appreciate the strengths and limitations of data science methods
  • Communicate the outcome of the data science process with effective data stories
  • Apply basic programming skills for data science in Python
  • Judge and incorporate ethical considerations in the data science process
  • Work constructively as a member of a team to carry out a complex project

Course Format

DS10 2021 will be all online with live Zoom classes, self-guided labs, and group projects. All materials and assignments will be available on Canvas. Each of the course components is discussed in more detail below. Here is the weekly rhythm of the course:

Tuesday

  • Attend the mandatory live class (1:30-2:45 pm ET, 75 min) to participate in activities, discussions, and projects in breakout groups. Class attendance is mandatory.
  • Before each class, you will be asked to complete an online quiz that tests your understanding of the asynchronous materials for that week.

    Thursday

  • Attend the mandatory programming lab (1:30-2:45 pm ET, 75 min) where you will work in pairs on self-guided lab notebooks to learn Python for data science. TFs will be available to help you. Lab attendance is mandatory.
  • Before each lab, you will be asked to complete an online quiz that tests your understanding of the programming concepts that you have learned so far.

    Throughout the week

  • Prepare for next week’s labs and live classes by going through asynchronous materials (lecture videos, articles, websites, etc.) and read one chapter each week in the Spiegelhalter textbook.
  • Work on weekly homework assignments, which include finishing your lab notebooks, programming exercises, and work for your group project.
  • Schedule office hours with the TFs to get help with the labs and homework assignments.
  • Overall, expect to spend about 10 hours on asynchronous materials, homework, and project work outside of labs and classes.

    Sunday

  • Hand in your homework assignments, lab notebooks, and project milestones.
  • Watch a quick video (2-3 min) to get an idea of what we will cover each week.

Limited Enrollment

To provide the best possible online learning experience during live classes we have limited enrollment in the course. If you are interested in joining this course please fill out the enrollment survey between January 13 and January 18, 2021. We will notify you by January 20 before the course registration deadline on January 21.

Prerequisites

This course has NO prerequisites. However, please be aware that learning data science and a new programming language like Python is a time-consuming process! This course is intended for students without programming experience. The purpose of the weekly labs is to introduce basic programming concepts at a high level. By the end of the semester, students will be equipped with the programming skills required to carry out a data science project in Python using Google Colab.

Textbooks

You will be reading one chapter each week in the required textbook for this class to cover data science methodologies.

The Art of Statistics: How to Learn from Data, by David Spiegelhalter (Required) David Spiegelhalter draws on real-world examples to introduce concepts from statistics and data science. The book also gives a good overview of many of the ethical issues that come up in data science. You can purchase the book from the Coop bookstore or any other online retailer.

An optional reference book for data science programming in Python: Python Data Science Handbook: Essential Tools for Working with Data, by Jake VanderPlas (Optional). We are providing Python notebooks that will cover all the programming material that you need for this course. However, if you are interested in delving deeper into Python as a data science programming language, we recommend this book.

Course Components

DS10 has four main threads:

  • Discussions of real-life case studies that show the data science process at work.
  • Learning fundamental methods and tools of data science.
  • Weekly labs to teach you Python programming skills for data science.
  • Practicing the data science process by conducting projects from beginning to end.

Live Classes

The class meets every Tuesday during a live Zoom call for joint class activities. Attending these live sessions is mandatory and a crucial component of learning the material in this course. Please arrive on time, as we will start promptly. Before each class, you will be asked to complete an online quiz that tests your understanding of the asynchronous materials for that week. At the end of each class, we will ask you to fill out and submit a one-minute reflection to collect feedback.

Programming Labs

You will attend mandatory online programming labs on Thursdays. Labs are interactive tutorials using notebooks that give you an introduction to programming data science in Python. You are allowed to work in pairs, and TFs will be available to help you. Completed lab notebooks need to be handed in with the homework each week. Solutions to labs will be made public the week after the lab has been published. Before each lab, you will be asked to complete a quiz that tests your understanding of the programming concepts you learned so far. At the end of each lab, we will ask you to fill out and submit a one-minute reflection to collect feedback.

Asynchronous Materials

In preparation for class each week, you will work on your own through asynchronous materials (lecture videos, websites, articles, book chapters, etc.) to ensure that you are prepared for the activities in class and the programming labs. You are expected to read and watch these materials posted on Canvas each week.

Homework

Weekly homework assignments are going to provide an opportunity to improve your data science and programming skills. Homework is due every Sunday. See the homework as an opportunity to learn, and not to “earn points”. Each student needs to hand in their own homework solution and adhere to our collaboration policy (see below). You can discuss solutions to homework assignments with your TFs during office hours, they will not be posted publicly.

Group Projects

A big part of the course will be group projects. You will work in teams of 2-3 students to conduct two data science projects with a series of graded milestones over the course of several weeks (see course schedule). First, you will work on a guided project in random teams assigned by us. The guided project will familiarize you with the data science process. After the guided project, you will work on a final project for which you can choose your own team. You can either choose to continue to work on your guided project, or you can propose a new data science project of your choice. A small number of projects will win a Best Project prize (Swiss chocolate) at the end of the semester.

Peer Assessments

Assessment of your participation in the projects is challenging. We will use peer evaluation to allow you to assess the members of your project teams as well as yourself. Peer assessments will count towards your project grades.

Office Hours & Piazza

The teaching fellows will provide online office hours at several different times each week for questions you may have. We will use Piazza as our discussion forum and for all announcements, so it is important that you are signed up as soon as possible. Piazza should always be your first resource for seeking answers to your questions. You can also post privately so that only the course staff sees your message.

Grading

This course can be taken for a letter grade only, there is no pass/fail option. The course grade comprises:

  • Participation (10%)
  • Exams (20%)
  • Homework Assignments (20%)
  • Guided Project (25%)
  • Final Project (25%)

Homework includes lab notebooks, lab questions, project milestones, and reading quizzes. Group projects receive team grades. In general, we do not anticipate that the grades for each team member will be different. However, we reserve the right to assign different grades to each team member based on peer assessments (see above).

Your participation grade includes watching lecture videos, participation in live classes, submitting one-minute reflections, and being helpful to other students on Piazza and during labs. We will drop your lowest quiz score. Any concerns about grading errors must be clearly articulated in writing and sent to our head TF Salma sabdelmagid@g.harvard.edu within one week of receiving the grade.

Course Policies

Zoom Norms

For live classes and programming labs on Zoom, please take care to:

  • work through and complete the week’s asynchronous materials and weekly quiz,
  • participate from a quiet office or similar space (and not from car, plane, or train), and
  • participate with your camera turned on, using horizontal (not vertical) video.

Due Dates & Late Policy

No submissions (labs, homework, quizzes, project milestones, etc.) will be accepted for credit after the deadline. Homework will be posted on Mondays and will be due the following Sunday. All submissions are due at 11:59 pm ET on the due date.Your lowest exam score and homework score will be dropped.

If you have a verifiable medical condition or other special circumstances that interfere with your coursework please let us know as soon as possible. You will need to provide a written note or email from a medical professional or your resident dean confirming your inability to participate in course work.

Missing Class or Labs

Because of the emphasis on activities and teamwork, it is important that you attend and proactively participate during the live classes and labs each week. We understand, however, that certain factors may occasionally interfere with your ability to attend these live sessions. You can miss class or labs up to four times during the semester without any negative consequences. After that, it will affect your participation grade. Please let us know as soon as possible if you have reoccurring extenuating circumstances that affect your ability to attend class. We will ask you for an email confirmation from your resident dean and will try to work out an agreeable solution with you and your team.

Collaboration Policy

We expect you to adhere to the Harvard Honor Code at all times. Failure to adhere to the honor code and our policies may result in serious penalties, up to and including automatic failure in the course and reference to the ad board.

You may discuss your homework and labs with other students, but you are expected to be intellectually honest and give credit where credit is due. In particular:

  • if you work on labs in pairs, you may submit the same lab notebook as your partner, but you must add their name to your notebook submission;
  • you have to complete your homework and weekly quizzes entirely on your own;
  • you cannot share your homework code with anyone else, including on Piazza;
  • you may not submit the same or similar work to this course that you have submitted or will submit to another; and
  • you may not provide or make available solutions to individuals who take or may take this course in the future.

If the assignment allows it and for your projects, you may use third-party libraries and example code, so long as the material is available to all students in the class and you give proper attribution. Do not remove any original copyright notices and headers.

Accessibility

Any student receiving accommodations through the Accessible Education Office should email or present their AEO letter to our head TF Salma sabdelmagid@g.harvard.edu as soon as possible. Failure to do so may prevent us from making appropriate arrangements.

Credits

The first version of this course was developed 2019-2020 by Joe Blitzstein, Liberty Vittert, Xiao-Li Meng, Hanspeter Pfister, Allen Downey, Salma Abu Ayash, Salma Abdel Magid, Robert Haussman, and Aditya Ranganathan with generous support from the SEAS Learning INCubator (LINC). Most programming lab notebooks were developed by Allen Downey. The course design and materials greatly benefited from the students who took Harvard STAT250 in 2019. We have heavily drawn on materials and examples found online and tried our best to give credit by citing the original sources. Please contact us if you find materials where credit is missing or that you would rather have removed.