 # What is Data Visualization

According to Wikipedia, Data visualization is an interdisciplinary field that deals with the graphic representation of data. It is a particularly efficient way of communicating when the data is numerous as for example a Time Series. From an academic point of view, this representation can be considered as a mapping between the original data (usually numerical) and graphic elements (for example, lines or points in a chart). The mapping determines how the attributes of these elements vary according to the data.

## Visual perception and Data Visualization

A human can distinguish differences in line length, shape, orientation, distances, and color (hue) readily without significant processing effort; these are referred to as “pre-attentive attributes”. For example, it may require significant time and effort (“attentive processing”) to identify the number of times the digit “5” appears in a series of numbers; but if that digit is different in size, orientation, or color, instances of the digit can be noted quickly through pre-attentive processing.

Effective graphics take advantage of pre-attentive processing and attributes and the relative strength of these attributes. For example, since humans can more easily process differences in line length than surface area, it may be more effective to use a bar chart (which takes advantage of line length to show comparison) rather than pie charts (which use surface area to show comparison).

## Examples of diagrams used for data visualization

1. Bar chart

Dimensions:

• length/count
• category
• Color

Description / Example usages

• Presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally.
• A bar graph shows comparisons among discrete categories. One axis of the chart shows the specific categories being compared, and the other axis represents a measured value.
• Some bar graphs present bars clustered in groups of more than one, showing the values of more than one measured variable. These clustered groups can be differentiated using color.
• For example; comparison of values, such as sales performance for several persons or businesses in a single time period.

2. Histogram

Dimensions:

• bin limits
• count/length
• Color

Description / Example usages

• An approximate representation of the distribution of numerical data. Divide the entire range of values into a series of intervals and then count how many values fall into each interval this is called binning. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent, and are often (but not required to be) of equal size.
• For example, determining frequency of annual stock market percentage returns within particular ranges (bins) such as 0-10%, 11-20%, etc. The height of the bar represents the number of observations (years) with a return % in the range represented by the respective bin.

3. Scatter plot

Dimensions:

• x position
• y position
• symbol/glyph
• color
• Size

Description / Example usages

• Uses Cartesian coordinates to display values for typically two variables for a set of data.
• Points can be coded via color, shape and/or size to display additional variables.
• Each point on the plot has an associated x and y term that determine its location on the cartesian plane.
• Scatter plots are often used to highlight correlation between variables (x and y).

4. Scatter plot (3D)

Dimensions:

• position x
• position y
• position z
• color
• symbol
• Size

Description / Example usages

• Similar to the 2-dimensional scatter plot above, the 3-dimensional scatter plot visualises the relationship between typically 3 variables from a set of data.
• Again point can be coded via color, shape and/or size to display additional variables

5. Network

Dimensions:

• nodes size
• nodes color
• ties thickness
• ties color
• Spatialization

Description / Example usages

• Finding clusters in the network (e.g. grouping Facebook friends into different clusters).
• Discovering bridges (information brokers or boundary spanners) between clusters in the network
• Determining the most influential nodes in the network (e.g. A company wants to target a small group of people on Twitter for a marketing campaign).
• Finding outlier actors who do not fit into any cluster or are in the periphery of a network.

6. Pie chart

Dimensions:

• Color

Description / Example usages

• Represents one categorical variable which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice (and consequently its central angle and area), is proportional to the quantity it represents.
• For example, as shown in the graph to the right, the proportion of English native speakers worldwide

7. Line chart

Dimensions:

• x position
• y position
• symbol/glyph
• color
• Size

Description / Example usages

• Represents information as a series of data points called ‘markers’ connected by straight line segments.
• Similar to a scatter plot except that the measurement points are ordered (typically by their x-axis value) and joined with straight line segments.
• Often used to visualize a trend in data over intervals of time – a time series – thus the line is often drawn chronologically.

8. Streamgraph

Dimensions:

• width
• color
• time (flow)

Description / Example usages

• A type of stacked area graph which is displaced around a central axis, resulting in a flowing shape.
• Unlike a traditional stacked area graph in which the layers are stacked on top of an axis, in a streamgraph the layers are positioned to minimize their “wiggle”.
• Streamgraphs display data with only positive values, and are not able to represent both negative and positive values.
• For example, the right visual shows the music listened to by a user over the start of the year 2012

9. Treemap

Dimensions:

• size
• Color

Description / Example usages

• Is a method for displaying hierarchical data using nested figures, usually rectangles.
• For example disk space by location / file type

10. Gantt chart

Dimensions:

• color
• time (flow)

Description / Example usages

• Type of bar chart that illustrates a project schedule
• Modern Gantt charts also show the dependency relationships between activities and current schedule status.
• For example used in project planning

12. Heat map

Dimensions:

• color
• categorical variable

Description / Example usages

• Represents the magnitude of a phenomenon as color in two dimensions.
• There are two categories of heat maps:
• cluster heat map: where magnitudes are laid out into a matrix of fixed cell size whose rows and columns are categorical data. For example, the graph to the right.
• spatial heat map: where no matrix of fixed cell size for example a heat-map. For example, a heat map showing population densities displayed on a geographical map

12. Stripe graphic

Dimensions:

• x position
• Color

Description / Example usages

• Uses a series of coloured stripes chronologically ordered to visually portray long-term temperature trends.
• Portrays a single variable—prototypically temperature over time to portray global warming
• Deliberately minimalist—with no technical indicia—to communicate intuitively with non-scientists
• Can be “stacked” to represent plural series (example)

13. Animated spiral graphic

Dimensions:

• rotating angle (cycling through months)
• color (passing years)

Description / Example usages

• Portrays a single dependent variable—prototypically temperature over time to portray global warming
• Dependent variable is progressively plotted along a continuous “spiral” determined as a function of (a) constantly rotating angle (twelve months per revolution) and (b) evolving color (color changes over passing years)

14. Box and whisker plot

Dimensions:

• x axis
• y axis

Description / Example usages

• A method for graphically depicting groups of numerical data through their quartiles.
• Box plots may also have lines extending from the boxes (whiskers) indicating variability outside the upper and lower quartiles.
• Outliers may be plotted as individual points.
• The two boxes graphed on top of each other represent the middle 50% of the data,, with the line separating the two boxes identifying the median data value and the top and bottom edges of the boxes represent the 75th and 25th percentile data points respectively.
• Box plots are non-parametric: they display variation in samples of a statistical population without making any assumptions of the underlying statistical distribution, thus are useful for getting an initial understanding of a data set. For example, comparing the distribution of ages between a group of people (e.g. male and females).

15. Flowchart

Dimensions:

• workflow or process

Description / Example usages

• Represents a workflow, process or a step-by-step approach to solving a task.
• The flowchart shows the steps as boxes of various kinds, and their order by connecting the boxes with arrows.
• For example, outlying the actions to undertake if a lamp is not working, as shown in the diagram to the right.

Dimensions:

• attributes
• value assigned to attributes

Description / Example usages

• Displays multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point.
• The relative position and angle of the axes is typically uninformative, but various heuristics, such as algorithms that plot data as the maximal total area, can be applied to sort the variables (axes) into relative positions that reveal distinct correlations, trade-offs, and a multitude of other comparative measures.
• For example, comparing attributes/skills (e.g. communication, analytical, IT skills) learnt across different a university degrees (e.g. mathematics, economics, psychology)

17. Venn diagram

Dimensions:

• all possible logical relations between a finite collection of different sets.

Description / Example usages

• Shows all possible logical relations between a finite collection of different sets.
• These diagrams depict elements as points in the plane, and sets as regions inside closed curves.
• A Venn diagram consists of multiple overlapping closed curves, usually circles, each representing a set.
• The points inside a curve labelled S represent elements of the set S, while points outside the boundary represent elements not in the set S. This lends itself to intuitive visualizations; for example, the set of all elements that are members of both sets S and T, denoted S ∩ T and read “the intersection of S and T”, is represented visually by the area of overlap of the regions S and T. In Venn diagrams, the curves are overlapped in every possible way, showing all possible relations between the sets.
Source: https://en.wikipedia.org/wiki/Data_visualization

## Visualize Data with Grepsr

Grepsr’s managed data scraping solutions will get you all the web data you need. No complicated software to use, no tools to configure! We use data extraction tools to gather required data for you and provide it to you in any form you desire. 