The Expert Guide to Retail Clustering Methods

Strengthen your assortment planning capabilities by understanding retail clustering methods, processes, and supporting technology.

Retail clustering has increasingly gained momentum within retail executive suites and merchandising technology vendors’ future development plans. With the wider interest and use of advanced analytics, artificial intelligence (AI), and machine learning (ML) – as well as more robust tools for supporting these capabilities and increasingly demanding customers – wider attention to retail clustering for assortment planning is inevitable.

Effective clustering provides the ability to unleash the true potential of assortment planning capabilities. Clustering enables significant financial benefits in terms of sales, margin, and inventory utilization. Further, properly executed retail clustering allows for improved customer satisfaction due to improving retailers’ ability to provide the “right” mix of products for customers, across locations and channels.

CONTENTS

What is Retail Clustering?

The Role of Clustering In Assortment Planning

Assortment Clustering vs. Customer Segmentation

Retail Clustering Systems and Tools

Ten Retail Clustering Methods

Parker Avery’s Perspective on Retail Clustering

Despite lots of conversation, we hear very little discussion about the various retail clustering methods that lie at the heart of most assortment planning approaches. Parker Avery would like to help remedy that situation by examining the various clustering methodologies that we’ve encountered through working with a variety of retailers, with the aim of providing some insight into which technique or combination of techniques makes the most sense for your business model.

In this expert guide, The Parker Avery Group discusses how retail clustering for assortment planning is an intricate undertaking with a variety of approaches and elements to consider. Granted, there are simple, straightforward retail clustering methods, but these tend to have significant shortcomings, and typically fail to create assortments that drive meaningful results. Conversely, more sophisticated approaches usually require skilled resources, solid data integrity, and appropriate supporting systems to take advantage of the potential these methods can deliver.

We will explore ten different retail clustering methods in depth and highlight the advantages and disadvantages, as well as under which circumstances each should be used. This understanding, coupled with clearly defined assortment planning objectives, will help retailers understand which clustering approaches are most appropriate to employ.

What is Retail Clustering?

The term clustering refers to “the process of grouping sales channels together based on similarities or patterns in their underlying customers’ behavior.” These similarities are most often gleaned from data related to historic or forecasted sales. Utilizing forecasted sales, as opposed to historical, allows a retailer to generate more stable clusters based on expected sales performance and to avoid more frequent re-clustering once historical sales trends become unrepresentative of the current state. In addition, information that is descriptive of the customers or of the store (e.g., demographic or climatic information) can also be utilized for the purposes of clustering analysis.

Clustering is frequently accomplished using a set of statistical algorithms that assemble a set of objects in such a way that objects in the same group (called a cluster) are more like each other than to those in other groups. The most frequently used statistical method for developing clusters is K-Means clustering, which requires the user to specify a target number of clusters. The algorithm then creates the specified number of groupings, such that the statistical distance between the clusters is maximized. This procedure can be done multiple times and the results are compared to determine the optimal number of clusters for the user’s purpose. For retail applications, clusters are typically formed by grouping stores and other sales channels (such as a website or catalog).

There are other statistical methods used for clustering with new ones being created every year, including methods using neural-network approaches allowing the creation of clusters based on unobserved factors, yet making it more difficult to explain the methodology and results.

Source: https://realpython.com/k-means-clustering-python/

Retail clustering methods have several applications in merchandising. Statistical grouping of sales channels can be very useful for allocation, macro-space planning/space brokering, size optimization, determining retail pricing zones, etc. For the rest of this conversation, though, we will concentrate on the application of clustering in assortment planning activities.

BOOK A DISCOVERY CALL

The Role of Clustering In Assortment Planning

Before we dig further into clustering, we need to briefly discuss retail assortment planning. Assortment planning is a term that has been in widespread use throughout the industry, yet does not have a clear, consistent definition. The meaning can vary depending on the perspective of the user and the situation. The term has been used variously to mean quantifying item-level sales and purchases, developing targeted assortments, assortment/space optimization, and more.

For purposes of this conversation, we will define assortment planning as “the practice of developing different product collections or ranges for targeted groups of customers.” There still may be other functions of the assortment plan. It may, for instance, be used to quantify purchases for each item or help determine the amounts of inventory to be distributed to each store and held back for direct sales and replenishment. Yet, for this discussion, the primary purpose of assortment planning is the development of tailored product ranges or collections.

Following this definition, retail clustering is the mechanism that is used to develop those targeted groups of customers. The ideal state of assortment planning would allow the targeting of a collection of products to each customer, based on his or her particular preferences. We may eventually be able to deliver this ideal state through omnichannel integration, but for the foreseeable future, it will not be attainable in the current environment of disjointed digital, bricks-and-mortar, and catalog channels. This is because a multitude of customers and customer types patronize any individual store location, making individual targeting impossible. Clustering seeks to overcome this challenge by grouping sales channels (stores, websites, catalog recipients, etc.) that demonstrate similarities in customer shopping behavior.

READ: THE EXPERT GUIDE TO ASSORTMENT PLANNING

Assortment Clustering vs. Customer Segmentation

In our discussions with Parker Avery clients, we sometimes encounter confusion between assortment clusters and customer segments. Customer segmentation involves the division of a customer base into groups that are similar in ways that are more applicable to product development and marketing. Segmentation uses factors such as age, gender, interests, attitudes, and spending habits to classify customers into behavioral or psychographic groups. Examples of these groups might be “Tech-Savvy Millennials” or “Golden Age RV Enthusiasts.”

While there are many similarities between assortment clustering and segmentation approaches, most attempts to use customer segments for assortment planning do not succeed. This is due to the fact that customer segments do not map cleanly onto sales channels. Any given store will have some representation of most or all customer segments within its customer base. While it is possible to construct assortments based on customer segments, it is very difficult to determine how to assign those assortments to sales channels. As a rule of thumb, sales channels should be clustered for assortment planning, while customers should be segmented for product development and marketing purposes.

Retail Clustering Systems and Tools

Retailers use a variety of tools to create clusters for assortment planning and other uses. These tools include legacy spreadsheets, specialty statistical analysis software packages, clustering solutions tailored specifically for use by retailers, and clustering functionality integrated into a broader assortment planning system. In addition, more recent wide adoption of open-source programming languages (e.g., Python, R, Julia, Scala) allows retailers’ own analytical teams to develop clusters using a combination of custom code and built-in functions. Depending on the organization’s clustering and assortment planning philosophy, any of these approaches can work well. When considering the best toolset to deploy at your organization, it is important to consider integration and availability of clustering metadata.

Once clusters are created, the cluster assignments typically need to be made available to an assortment planning tool. Your clustering methodology should allow for easy integration of cluster data between your clustering and modern assortment planning systems. Also, the characteristics of sales channels that have been placed in the same cluster provide important clues about the product preferences of the underlying customer base to the merchant or assortment planner as they undertake the development of targeted assortments. This cluster metadata may communicate basic information such as the number of stores or geographic location of sales channels. It may also convey more complex insights, such as demographic information or product preferences of the cluster’s customer base. Your toolset should be capable of presenting this type of characteristic cluster information to end users as they make product decisions.

BOOK A DISCOVERY CALL

Ten Retail Clustering Methods

There are many approaches to assortment clustering in use today; some approaches are quite basic, while others require advanced statistical analysis capability. The approaches also may be mixed and matched to meet the assortment targeting needs of your organization. We will describe the major ones in this section.

One key consideration in determining the ideal clustering method for your company is the complexity that is added to the merchandising process. Some of these approaches require very little ongoing maintenance, while others demand that new clusters be created for each collection or floorset. Some basic methods use the same cluster structure for all categories of product. More complex approaches necessitate the development of different clusters for each category or class of products.

Single Assortment

Each sales location receives the exact same selection of items in the assortment

This approach may work well for companies that have a focused, concise product offering that represents the brand image. It also may be applicable to retailers with few locations or sites that are situated in very similar markets. Certain premium brands, such as Prada or even Apple, may thrive with this strategy.

On the other hand, retailers with broader product offerings and more diverse store bases may have great difficulty in maximizing sales and margin with this approach. We have frequently heard the lament, “How can we manage multiple assortments when we can’t get one right?” The reason is that a single assortment is ill suited to fulfill the needs of a diverse customer and store base.

READ: Fashion vs. Basic Assortment Planning

Channel-Based Clusters

Each channel (brick-and-mortar, online, catalog, etc.) is a distinct cluster and has its own variant on the assortment

This represents a good preliminary approach to differentiating assortments. It allows the retailer to take advantage of the unique display characteristics of each sales channel, particularly the “endless aisle” offered by digital channels. It also increases the probability that the retailer can meet a customer’s needs by allowing fulfillment from multiple assortments across multiple channels. Typically, the online channel has the broadest offering, with stores and catalogs being culled down from there. Sometimes, retailers will have “retail only” items as well, usually in cases where products are impacted by state regulations (e.g. liquor or firearms) or have physical characteristics that make them impractical to sell online. Further, some retailers offer online-only items to provide a broader assortment without taking up expensive shelf space in the physical store.

The downside of this approach is that it can sub-optimize the brick-and-mortar channel. It is too simplistic to reflect regional and local differences in customer preference and demand. Since it doesn’t allow for the tailoring of assortments within a channel (only across channels), it is likely that retailers following this approach are suffering from slow moving choices in some locations and excess demand in others.

Sales Volume-Based Clusters

Sales channels are classified based on forecasted sales volume, expressed in either dollars or units

This is a very common retail clustering approach, whose main benefit is that it is relatively easy to understand and implement. Frequently, some type of store volume-based attribute is already available in the location data, having been created for use by allocation or replenishment tools. Supporters will use this approach to expand the breadth of the product offering in high volume stores and edit it in low volume stores.

Unfortunately, sales volume-based clustering isn’t very useful for developing differential assortments. Store sales volume typically is driven by population density, traffic patterns, co-tenancy, local competition, and other factors not related to the product offering. Stores located in Miami and in Minneapolis coincidentally may be in the same sales volume cluster, but it would be a mistake to assume that they would require the same items. And what to do with high sales volume stores with small selling floors? This approach does not help much in tailoring assortments to those needs.

Store Capacity-Based Clusters

Stores are grouped together based on some measure of their available display space, usually expressed in selling square footage, SKU counts, or on detailed space planning data

This clustering approach is helpful in determining the number of choices to house in each cluster, as it is based on the display capacity of the location. It also has the benefit of being relatively easy to understand and execute. However, it does little to aid in determining how to make up the actual content of those choices – i.e., the assortment of products.

As with sales volume-based clusters, stores in Miami and Minneapolis (as an example) may have the same selling square footage but might require dramatically different assortments. Also, following the pure capacity-based approach does not consider the sales velocity generated by the location. This could result in sending an excessively broad assortment to a large square footage store with poor sales potential.

Sales Volume & Store Capacity-Based Clusters

Clustering is based on a combination of historic or forecasted sales and a measure of capacity

While we mentioned earlier that all of these approaches could be mixed and matched, this particular combination is common enough to merit a separate discussion. It has the benefit of considering both capacity and sales volume, so it provides a decent chance of getting the size of the assortment correct. Unfortunately, once again this approach doesn’t help much with determining the content of the assortment to be assigned to each cluster. Also, this combination of factors has the effect of multiplying the resulting number of clusters, which dramatically increases assortment planning complexity. Most retailers that follow this method end up with a cluster of high capacity / low volume stores, presenting a major challenge:

Does the retailer provide these stores with an extended assortment to help fill the display space? In so doing, they will be sending many below average performing items to a low volume store, creating markdown jeopardy.
Or do they send an abbreviated assortment, commensurate with the sales volume, but leave a significant portion of display space empty?

Neither option seems to fit the bill.

BOOK A DISCOVERY CALL

Climate-Based Clusters

Store locations are segmented based on seasonal weather patterns

This approach is frequently used by retailers who carry items with pronounced seasonality, such as swimwear, winter coats, or patio furniture. While we wouldn’t propose that winter boots appropriate for Alaska should be carried in Los Angeles, this method can be less straightforward than it seems. Parker Avery has performed multiple studies on seasonal selling patterns that have shown counterintuitive results. In one example, a national apparel chain discovered that in January, its bestselling swimsuit stores were in frigid Minnesota. In another study, no evidence could be found that sales of winter jackets spiked earlier in the North than in the South.

Before embarking on a climate-based clustering effort, we would advise performing in-depth analyses of the regional sales performance of seasonal merchandise to validate the approach. Once the underlying data confirm the validity of a climate-based scheme, these clusters can be used to tailor assortments or adjust the timing of item introductions to closely match local demand patterns.

Store Type-Based Clusters

Sales channels are grouped based on a salient characteristic of their local market

Store type-based clusters frequently arise from local store managers’ or district managers’ requests for specific types of merchandise, based on direct customer feedback or perceived market needs. Examples of store type clusters include “campus stores” (that require more back-to-school items and appropriate team merchandise) or “resort stores” (that require beach towels, sunscreen, and flip flops throughout the year).

While store types are often identified using input from the store operations organization, this information is sometimes combined with data analysis. Store type clusters tend to be created and maintained manually as a location attribute, but still may be interfaced into an assortment planning solution to allow visibility and planning by end users. This approach can be very effective at capturing some limited localized demand. On the other hand, the manual nature of this approach usually precludes deploying it on a broad scale. Also, assortment requests from store operations can be based on a few anecdotal customer interactions, which may not be representative of true underlying demand. These cases can result in a lot of effort, but ultimately drive few incremental sales.

Competition-Based Clusters

Clusters are based on the presence of specific competitors in their market area or based on a measure of competitive intensity

This method is not broadly used, but may have some application for retailers that face strong, differentiated, regional competitors. As an example, a broad-line mass merchandiser may choose to beef up their assortment of hunting, fishing, and camping gear if they compete in a market against an outdoor specialty superstore. Many retailers face a distinct set of competitors for their ecommerce channel, and may elect to use this approach to offer special or extended assortments. This approach does not provide merchants and assortment planners any information about the types of products they should add to or edit from their assortments. Instead, competitive shopping and other forms of research must be used to help determine the optimal product mix. Competition-based clusters are also quite useful for price management (but that’s a topic for another day).

Demographics-Based Clusters

Clustering is based on statistical data about the characteristics of the shopping population

This approach has some benefit, particularly if products within an assortment have a clear appeal to a particular demographic group, such as with ethnic foods or specialized products for the aged. Clusters are created based on characteristics that might include average age, ethnicity, income level, population density, educational level, and others.

There are several challenges with the demographics-based technique, however. The first is that the demographic data associated with any particular store may not actually represent the actual shoppers. Most demographic data that is available to retailers is based on U.S. Census data. It represents the characteristics of the population of a certain radius around the store, typically 5 miles. Unfortunately, the population that shops at a particular store is not necessarily representative of the population surrounding the physical address of the store.

Let’s examine a retail store located adjacent to Penn Station in New York City. The demographics-based clustering approach would suggest that the population shopping that store would resemble that of Manhattan. Yet, since Penn Station is the terminus of the Long Island Railroad, which carries millions of commuters to the city each year, the stores’ actual shoppers might more closely resemble the more middle and working class folks from Nassau and Suffolk counties and even farther away areas serviced by the NJ Transit and Amtrak.

BOOK A DISCOVERY CALL

Product Attribute-Based Clusters

Sales outlets are grouped based on sales history of meaningful product attributes of the assortment

In our opinion, product attribute-based clusters is one of the most valuable retail clustering approaches and is most often used by during client initiatives involving clustering. This approach has the benefit of the clusters being explicitly tied to the makeup of the assortment. It removes the guesswork on the part of the merchant about which products satisfy the customers in which clusters. To illustrate, an attribute that may be useful for jewelry might be “material,” with distinct values including gold, silver, stainless steel, platinum, hematite, etc. Stores are clustered based on their relative sales of products exhibiting these attributes. Should a store exhibit an affinity for silver jewelry (perhaps because it is based in the Southwestern United States), then the merchant can simply assign more silver items to the assortment for that store.

Multiple attributes can be used to describe the same assortment. For example, jewelry could also be described by price point, material, cut, or gem type. To identify the best attributes, we recommend undertaking a statistical analysis of the relationship between the available product attributes and sales to determine which attributes drive differential sales performance from location to location. The attributes with the most impact should be the ones used for clustering.

To further illustrate, let’s examine a real-world example from the category of “beverages.” In this case, the most sales-impactful attribute happened to be “end use,” with attribute values including juice, energy, vitamin, tea, kid’s, soda, sparkling water, etc. Stores were clustered based on the penetration of each of these attributes in their sales history. Below are graphical representations of the penetration of each of those attribute values in two of the resulting clusters.

Check out our case studies

Sample Product-Attribute Cluster Analysis

As you can see, customers in the stores that make up Cluster 1 have a clear preference for new age, teas, vitamin, and energy drinks. These same customers are not as interested in traditional beverages, such as still water, sparkling water, and juice. Cluster 2 seems much more oriented toward thirst quenching, over-indexing on still water, soda, and isotonic (such as Gatorade), at the expense of new age, vitamin, and energy. The strength of those preferences can even be gauged by the magnitude of the index number. In Cluster 1, customers have bought 1.4X the overall average amount of energy drinks, while they have purchased half the overall average amount of sparkling water. This kind of precise preference data can directly inform the number of choices assigned to each cluster that bear each attribute.

Once attribute clusters are formed, demographic data for each cluster can be analyzed to determine if there are any significant relationships between population characteristics and cluster membership. If such relationships exist, there is now some compelling insight into the makeup of the customer base of that cluster. If demographics reveal no significant population characteristics for the cluster, all of the necessary information still exists to make intelligent assortment decisions. Demographic cluster characteristics can also be used to create a model to predict which cluster a new store might fit into.

One significant drawback of this approach is that it demands the use of different clusters for each product category, which tends to increase the complexity of creating and maintaining store clusters – especially when faced with a rapidly changing store base. Another potential source of complexity comes with categories that have many different seasonal assortments, for example in apparel. If an apparel retailer has six seasons or collections a year and drops six distinct assortments with different attributes, then the clustering and assortment planning processes may have to be performed six different times.

Another shortcoming of this method is that it does not consider the display capacity of the stores within each cluster, as well as vendor space and placement requirements. To overcome this problem, a hybrid method could be employed that includes both the penetration of product attributes and the capacity of the store. Perhaps a preferable method would use attribute-based clustering to determine the content of a “master assortment” for each category. Items within that “master assortment” could then be ranked by sales importance and culled down to fit the available display space in each store.

Parker Avery’s Perspective on Retail Clustering

All the clustering methods we discussed earlier have their advantages and drawbacks, and there is not a single method that works best for every retailer. In many cases combining multiple methods together yields the most optimal results. For example, an omnichannel retailer can greatly benefit from setting a single assortment for its online channel and using product attribute-based clusters to optimize the assortment for its brick-and-mortar stores. This way, the retailer optimizes the assortment of its physical stores based on the local preferences of its shoppers and simplifies the assortment for its less location-specific online channel.

We caution against not combining too many clustering methods. While each added method allows increasing the precision of assortment localization, it also multiplies the number of clusters that need to be created, analyzed, and maintained over time, which can result in significant time investment from analytical resources.

As we have illustrated, clustering for assortment planning is a complex undertaking with many different factors to take into account. Simpler, more straightforward methods tend to have significant shortcomings for creating meaningful differentiated assortments. More sophisticated approaches bring with them increased complexity, which may require more manpower or systems resources to successfully employ. Yet, the financial rewards for getting targeted assortments right are significant. After all, every markdown dollar saved goes directly to the bottom line. It is seductive to gloss over clustering capability when creating an assortment planning strategy or selecting and implementing an assortment planning solution, as retail clustering is commonplace, highly analytical, and too often taken for granted. Don’t fall into this trap. Start by clearly defining the purpose and objectives of your assortment planning efforts. Once these are established, the best clustering approach can be identified and operationalized.

BOOK A DISCOVERY CALL