Why Tiny Fabric Defects May Vanish in Deep Layers and How to Rethink Feature Representations


Introduction

In the world of image-based anomaly detection, especially with texture-heavy datasets like AITEX, one key architectural question keeps surfacing:

“Should we prioritize high spatial resolution or deep feature channels?”

This question goes beyond model selection. It taps into the fundamental way neural networks represent visual information. If youโ€™re building systems to detect minute, barely visible anomalies, this decision could make or break your modelโ€™s success.

In this post, weโ€™ll unpack the core concepts, contrast the strengths and tradeoffs of each approach, and provide practical guidance tailored for anomaly detection use cases.


The Basics: What Are Spatial Resolution and Channel Depth?

ComponentMeaning
SpatialThe height and width of a feature map (e.g., 32×32 or 8×8)
ChannelsThe number of feature descriptors per spatial location (e.g., 64, 448, 1280)

You can think of a feature map as a grid. Each cell holds a descriptor. The spatial resolution tells you how many cells. The channels tell you how much detail each cell contains.


Two Ways to Represent Information

Letโ€™s look at two architectural extremes:

1. High Spatial Resolution, Low Channels

Example: Early layers in ResNet or CNN blocks like conv1 or layer1
Shape: [32x32x64]

  • Localized details: edges, textures, and positional patterns
  • Great for spotting where something is unusual
  • Poor at understanding what it is

2. Low Spatial Resolution, High Channels

Example: Deep layers like ResNet34 layer4, or DenseNet121 block4
Shape: [8x8x1280]

  • Captures abstract and semantic concepts
  • Great for identifying high-level anomalies (wrong object, category mismatch)
  • But tiny local defects can get smoothed away

Why This Matters in texture heavy datasets (like AITEX and similar)

AITEX images consist of highly repetitive textile textures with subtle, pixel-level anomalies.

That means:

  • You care a lot about precise localization
  • You donโ€™t need deep semantic abstraction (we’re not classifying cats vs. dogs)

This tips the scale toward higher spatial resolution in feature extraction.


Are Spatial and Channel Representations Equivalent?

Short answer: No. They encode different axes of information:

AspectHigh SpatialHigh Channel
FocusWhere is the anomaly?What does the region represent?
GranularityLocal detailFeature richness per spot
TradeoffWeak semantic contextLoses local nuance

While some tasks (like object recognition) favor deeper semantic layers, anomaly detection on textures typically needs precise structural integrity, especially when anomalies are just a few pixels wide.


Practical Insight: The ResNet vs. DenseNet Case

In one of my projects I tried out two backbones for a PatchCore approach for anomaly detection:

  • ResNet34 layer2 gave [32x32x128] โ†’ good anomaly visibility
  • DenseNet block4 gave [8x8x1280] โ†’ rich features, but spatially too coarse

Even though DenseNet had more information per patch, the tiny anomalies were not clearly encoded โ€” they were simply too small to leave a signal in such large receptive fields.


Final Thoughts: Striking a Balance

For anomaly detection on structured textures:

  • โœ… Start with early or mid-level CNN features (like ResNet layer1+layer2)
  • โœ… Consider feature fusion (e.g., PatchCore style: fuse block1 + block4)
  • โœ… Visualize feature maps to see if anomalies are distinguishable

If your feature map looks like an 8×8 chessboard with no visual cue of the defectโ€”chances are, your model wonโ€™t see it either.


Closing Thought

Understanding the tradeoff between spatial resolution and channel richness isnโ€™t just theoretical. It directly impacts your modelโ€™s ability to detect subtle flaws in images.

When in doubt, remember:

“More channels doesnโ€™t compensate for lost spatial clarity.”


Leave a Reply