Пошаговые инструкции

Gather Your Inputs and Determine Range

First, identify your complete raw dataset. From this, find the **minimum** (lowest) and **maximum** (highest) values. Then, calculate the **Range (R)** by subtracting the minimum value from the maximum value. * **Example Dataset**: `[12, 18, 25, 30, 15, 22, 28, 35, 10, 20, 27, 32, 16, 23, 29, 38, 14, 21, 26, 31]` * **Total Number of Data Points (N)**: 20 * **Minimum Value**: 10 * **Maximum Value**: 38 * **Range (R)**: `38 - 10 = 28` * **Desired Number of Bins**: 5

Calculate Class Width

Next, calculate the **Class Width (CW)** for your bins. Divide the Range (R) by the desired Number of Bins. It is critical to **round this value UP** to the next whole number or a suitable decimal place. Rounding up ensures that all data points, including the maximum value, will fit into your defined bins. * **Formula**: `CW = Range / Number of Bins` * **Example Calculation**: `CW = 28 / 5 = 5.6` * **Rounded Up Class Width**: `CW = 6` (rounding 5.6 up to the nearest whole number).

Define Class Intervals (Bins)

Now, establish the lower and upper bounds for each class interval (bin). Start the first bin's lower bound with your dataset's minimum value. For subsequent bins, add the Class Width (CW) to the previous bin's lower bound to get the new lower bound. The upper bound of each bin is typically one unit less than the lower bound of the next bin (for integer data) or defined to avoid overlap (e.g., using `[lower, upper)` notation). * **Bin 1**: Starts at the Min Value (10). Upper bound is `10 + CW - 1 = 10 + 6 - 1 = 15`. So, `[10 - 15]` * **Bin 2**: Starts at `15 + 1 = 16`. Upper bound is `16 + CW - 1 = 16 + 6 - 1 = 21`. So, `[16 - 21]` * **Bin 3**: Starts at `21 + 1 = 22`. Upper bound is `22 + 6 - 1 = 27`. So, `[22 - 27]` * **Bin 4**: Starts at `27 + 1 = 28`. Upper bound is `28 + 6 - 1 = 33`. So, `[28 - 33]` * **Bin 5**: Starts at `33 + 1 = 34`. Upper bound is `34 + 6 - 1 = 39`. So, `[34 - 39]` *(Note: Ensure your final bin's upper bound is at least equal to or greater than your dataset's maximum value. Here, 39 is greater than 38, so all data is covered.)*

Tally Frequencies for Each Bin

Go through your raw dataset point by point and tally which class interval each data point falls into. The count for each bin is its **Frequency (f)**. The sum of all frequencies must equal the total number of data points (N). * **Dataset**: `[12, 18, 25, 30, 15, 22, 28, 35, 10, 20, 27, 32, 16, 23, 29, 38, 14, 21, 26, 31]` * **Bin 1 [10 - 15]**: `10, 12, 14, 15` -> **Frequency (f) = 4** * **Bin 2 [16 - 21]**: `16, 18, 20, 21` -> **Frequency (f) = 4** * **Bin 3 [22 - 27]**: `22, 23, 25, 26, 27` -> **Frequency (f) = 5** * **Bin 4 [28 - 33]**: `28, 29, 30, 31, 32` -> **Frequency (f) = 5** * **Bin 5 [34 - 39]**: `35, 38` -> **Frequency (f) = 2** * **Total Frequencies**: `4 + 4 + 5 + 5 + 2 = 20`. This matches the total number of data points (N=20).

Calculate Relative Frequencies

For each bin, calculate its **Relative Frequency (RF)** by dividing the bin's Frequency (f) by the Total Number of Data Points (N). This provides the proportion of data in each bin. The sum of all relative frequencies should be 1 (or very close to 1 due to rounding). * **Formula**: `RF = Class Frequency / Total Number of Data Points (N)` * **Total N**: 20 * **Bin 1 [10 - 15]**: `RF = 4 / 20 = 0.20` * **Bin 2 [16 - 21]**: `RF = 4 / 20 = 0.20` * **Bin 3 [22 - 27]**: `RF = 5 / 20 = 0.25` * **Bin 4 [28 - 33]**: `RF = 5 / 20 = 0.25` * **Bin 5 [34 - 39]**: `RF = 2 / 20 = 0.10` * **Total Relative Frequencies**: `0.20 + 0.20 + 0.25 + 0.25 + 0.10 = 1.00`. This confirms the calculations are correct.

Construct the Histogram (Conceptual)

With your calculated class intervals, frequencies, and relative frequencies, you now have all the necessary data to construct a histogram. On a graph, the horizontal axis (x-axis) represents the class intervals, and the vertical axis (y-axis) represents either the frequency or the relative frequency. Draw bars for each interval, with the height of each bar corresponding to its calculated frequency or relative frequency. Ensure there are no gaps between the bars, as this is characteristic of a histogram (unlike a bar chart for categorical data).

A histogram is a powerful graphical representation that organizes a group of data points into user-specified ranges. It visually depicts the shape of the data's distribution, showing where values are concentrated and where they are sparse. Understanding how to manually calculate the underlying data for a histogram, including class width, frequency, and relative frequency, is fundamental to interpreting statistical analyses and verifying automated results.

This guide will walk you through the process of manually preparing data for a histogram, offering a clear, step-by-step approach complete with formulas, a worked example, and common pitfalls to avoid.

Prerequisites

Before you begin, ensure you have:

Your Raw Dataset: The complete list of numerical values you wish to analyze.
Desired Number of Bins (Classes): A predetermined number of intervals you want to divide your data into. This is often chosen based on the dataset size or analytical goals (e.g., typically between 5 and 20 bins).

Understanding Key Formulas

To construct histogram data, you'll use these core formulas:

Range (R): The difference between the highest and lowest values in your dataset. R = Maximum Value - Minimum Value
Class Width (CW): The size of each interval (bin). This must be consistent across all bins. CW = Range / Number of Bins Crucially, always round this value UP to the next whole number or a suitable decimal place to ensure all data points are included.
Frequency (f): The count of data points that fall within a specific class interval.
Relative Frequency (RF): The proportion of data points that fall within a specific class interval, expressed as a decimal or percentage. RF = Class Frequency / Total Number of Data Points

Worked Example

Let's use the following dataset to illustrate the process: [12, 18, 25, 30, 15, 22, 28, 35, 10, 20, 27, 32, 16, 23, 29, 38, 14, 21, 26, 31]

Assume we want to create 5 bins for this dataset.

Common Pitfalls to Avoid

Incorrect Class Width Calculation: Forgetting to round the class width up can lead to the highest data points being excluded from the histogram. Always round up to ensure all data is captured.
Overlapping or Gaps in Bins: Ensure your class intervals are mutually exclusive (no overlaps) and collectively exhaustive (no gaps). A common convention is [lower bound, upper bound) meaning the lower bound is included, but the upper bound is not. For integer data, you might use [lower bound, upper bound], ensuring the next bin starts at upper bound + 1.
Inconsistent Bin Sizes: All bins must have the same class width. Deviating from this distorts the distribution's visual representation.
Miscounting Frequencies: Carefully tally each data point into its correct bin. Double-check your counts; the sum of all frequencies should equal the total number of data points.

When to Use a Histogram Calculator

While manual calculation is excellent for understanding the mechanics and for smaller datasets, a dedicated histogram calculator offers significant advantages:

Large Datasets: For hundreds or thousands of data points, manual tallying becomes time-consuming and prone to error.
Speed and Efficiency: Generate results almost instantly, allowing for quick exploration of different bin counts.
Accuracy: Automated tools eliminate human calculation errors.
Visualization: Most calculators not only provide the data but also generate the histogram graph itself, saving you the effort of manual plotting.

Use manual methods for learning and small-scale analysis, but leverage calculators for efficiency and accuracy in professional settings with extensive data.

How to Calculate Histogram Data Manually: A Step-by-Step Guide