Data
Visualization
Project
You
are required to create a project and present this project to the class
at the end of the semester. You may work in teams (size to be
determined in class.
Your
task is to select a dataset and tell a visual story about this dataset
that is
meaningful in whatever context that you define. This includes
both exploration and explanation. You should visually explore
the dataset for possible relationships among variables,
or possible trends (if applicable), or possible insights. You
should then present your conclusion, your "sales pitch",
or make your point in a visually persuasive or effective
manner . The specific visualizations will depend upon the
nature of your data and the story that you want to
tell.
Data
Consider
the dataset you select for the project.
You can use almost any dataset, with my approval. However, this course isn’t
focused on the data
cleaning process. I
would suggest the
following criteria:
- Use relatively clean data. This means data that are complete
and correct, with few anomalies. Use data that have few
enough missing values that you can eliminate those with no effect on
your results.
- Use data with at least 1 (preferably 2) categorical
variables and at least 2 (preferably 3) numeric variables.
This gives you the flexibility to construct many different
types of visualizations.
- Use enough observations (rows) that you can see the
pattern, trend or relationships in your charts, but not so much data
that your computer runs like a turtle. The sweet spot depends
on the resources of your computer. Try some datasets and see
what works in a reasonablie time for you.
- Creativity counts. Select a domain of interest
and ask some questions that can be visually captured with the dataset
that you selected. Your dataset can be drawn from the domains
of science, technology, business, politics, social justice, sports, or
practically any area, as long as you have data that support the
exploration and explanation visualization processes.
Include the following application requirements and technical
specifications for the project:
Assume that this is a project for an individual, a group, a business or
an organization.
Presentation requirements of the visual explanation and explanation of your dataset:
- What is the title of this presentation? The title
should capture the focus of your visualizations.
- What is the organization/individual for whom this presentation
is prepared?
- What is the mission of this organization (or, for an
individual, what is the underlying interest of the individual in
pursuing this applications domain)?
- What is the purpose of this project?
- Why will this project support the mission of the
organization/individual?
- Tell us about your dataset:
- How is it related to the organization?
- How is it related to the purpose that you stated?
- How many attributes? Tell us about them.
- How many observations? Any issues, cleanup?
(It's okay if it was completely clean.)
- Walk through your exploration process. I don't just want to
see a series of slides with tables, graphs and charts. I want to see how you
used these visualizations to explore your dataset, to get a better
understanding of the data, and show us how you progressed from one
visualizaiton to the next. What question was answered, or what
new question was raised that inspired you to try another kind of graph
or chart? Were things too cluttered, and you needed
clarification? Did you notice a trend or a relationship and decide to
explore that further? Did you want more detail? Less
detail? A different type of visualization?
- Discuss
the technical details of your choices: Explain your use of color,
tick marks, annotations, titles, axes scale, or any other feature that
you included.
- After
showing us your visual exploration of the dataset, visually present
some conclusions, talking points, or other explanatory visuals. Did you make a point? Did you tell a story?
Technical requirements:
- Include at least one clustered (side-by-side) bar or column chart.
- Include at least one stacked bar or column chart.
- Include at least one scatter plot.
- Include
one graph or chart using RBase graphics. (A hstogram or boxplot
might work well, but you are not limited to those.)
- Include one
line or area graph. It may not be appropriate for your data.
Do it anyway, and explain why it is or is not appropriate.
- Include one R dotplot, or violin plot.
- Use scale_color and/or scale_fill at least once.
- Include at least one faceted graph.
- Include one Tableau chart from your in-class Marks assignment.
- Include one Tableau chart from your in-class drill-down assignment.
- Include one Tableau chart from your in-class new-feature assignment.
- For excellence points: use melt and/or reshape in R.
- For excellence points: Find some package in R that we didn't use in class. Use it.
Some things I am looking for:
- Appropriate use of data.frames.
- Conversion to/from factors.
- Selection of rows and columns to extract the appropriate subset of data for graphing in R.
- Approrpiate selection of graph for your data type.
- Correct interpretation of the graph.
- Understanding of the effectiveness of the graph.
- Creativity.
- Effective use of color for the application and for your audience.
- Adherence to Tufte, Cleveland, etc. guidelines for effective visualization.
- I
am interested in the process!! You may show the same graphs
through several iterations, and how you refined it to make it better.
- Your PPT presentation.
- A zip file with your entire R application.
- Your Tableau .twbx file(s)