Onstott's Observations


A Digital Space for Data Science and Statistics.

Simulating Rolls of Fair and Weighted Dice

Objective.

The goal of this article is to display knowledge of basic statistics and Python programming. These skills are required for a data-focused career and are often a part of technical interviews and screenings.


Identifying Dog Breeds with Convolutional Neural Networks

Motivation.

The task of correctly identifying a dog’s breed is a challenge for most people. Some breeds differ only slighlty, appearing identical except for subtle fur or shape distinctions. Other breeds have a wide range of fur colors and types which increases the complexity of making the right determination. Due to the high inter-class and intra-class variation, this use case is ideal for practicing deep learning skills. This article will compare self-built (from scratch) convolutional neural network architectures and contrast the results with a transfer learning model built from pre-trained layers.


Time Series Analysis of Beijing Air Quality

Motivation for this post.

Time series analysis is the application of methods for analyzing time series data to efforts extracting meaningful statistics and characteristics. A time series is a collection of data points that possess a natural temporal ordering. Time series are usually sequenced at successive equally-spaced points in time, making them discrete.


The Significance of Power and Sample Size

Motivation.

Power analysis is a powerful tool capable of informing the design of experiments and analysis of their results. The power of an experiment reflects the confidence associated with the conclusions reached from its results. As a result, power is an indispensible consideration for successful statistical hypothesis testing. Another fundamental criterion of hypothesis testing is sample size. Determining the optimal sample size is very important for organizations. It directly relates to business investments of time, money, and employee resources; furthermore, it is capable of dictating whether results are considered statistically significant or not. This post will examine the effects of power and sample size in the statistical inference process.


A Walk Through the Monty Hall

Motivation.

The Monty Hall problem is a probability puzzle based on the television game show Let’s Make a Deal. It is named after its original host Monty Hall. This problem became famous in 1990 after causing a great deal of confusion following its appearance in a column in Parade Magazine. The author of the column and the solution to this problem, Marilyn vos Savant, is quoted as saying: