Shopify: Tips and best practices#

Configuring advanced data collection options for Shopify#

For certain report types, Shopify datastreams provide additional configuration options to allow tailored data collection. These can be adjusted in the Configuration step of the datastream creation wizard or in the Settings tab of your Shopify datastream overview.

  • Abandoned checkouts: Select the checkout status for which to collect data.

  • Collections and products: Select specific Collection IDs to limit data collection to chosen collections; leave empty to include all.

  • Inventory items: Select locations to collect inventory data for; leave empty to include all locations.

  • Metafields: Select the owning resources for metafield collection.

  • Orders count: Configure by financial status, fulfillment status, and order status.

  • Transaction: Configure by order statuses and transaction kinds. If left unselected, all available types are included.

Collecting order data for more than the last 60 days#

To collect order records older than 60 days, update the Shopify authorization:

  1. Open the Authorizations page and select your Shopify authorization.

  2. Click Authorize, and sign in to your Shopify account.

  3. Click Update when prompted.

This process adds the read_all_orders scope, granting access to historic orders.

REST API and GraphQL API usage and transition#

Shopify supports both REST and GraphQL Admin APIs. Each is suited for specific data access needs:

  • REST API: - Multiple endpoints, each targeting a resource (e.g., orders, products, collections). - Designed for standard, resource-specific data retrieval. - Weaker typing, less validation, and multiple calls needed for related data. - As of October 1, 2024, REST API is considered legacy; critical endpoints will cease on February 1, 2025.

  • GraphQL API: - Single, strongly typed endpoint for flexible data requests. - Supports bulk operations for efficient large dataset retrieval; only one concurrent bulk operation allowed per shop. - Enables fetching related entities in a single query. - All new integrations should use GraphQL.

  • Operational note: - Some reports (such as Discount Codes) have migrated to GraphQL. - Bulk operations for large datasets are subject to processing time and scheduling considerations (e.g., ~1M rows ≈ 1h, ~4M rows ≈ 3h).

Data retrieval strategies and best practices#

  • Data uniqueness: Set the lowest data resolution (LDR) according to unique fetch requirements.

  • Historical data: Prefer using created_at as fetch date for historical retrieval.

  • Incremental updates: Use updated_at when available to capture recent changes efficiently.

  • Schedule offsets: Apply a -1 day offset if same-day data is required.

  • Bulk operations: Schedule datastreams to avoid overlapping jobs as only one bulk operation can run per shop at a time.

Levels of granularity#

Shopify provides granularity at various levels:

  • Resource level: Products, Orders, Customers

  • Event level: Transactions, Fulfillments

  • Custom: Metafields

Collections vs Smart Collections#

  • Collections (Manual): Groups of products manually curated by store staff—ideal for “Staff Picks” or similar selections. Manual update is required as products change.

  • Smart Collections: Automatically update products based on conditions or rules (e.g., price, tag, vendor, inventory level). Useful for dynamic groupings such as “Under €50” or “New Arrivals”.

Entity relationships: effects of data creation#

When a new ORDER is created: - Inventory levels are updated. - Customer records are created or updated. - Order and fulfillment statuses are set. - Financial transactions are generated (authorizations, captures, refunds). - Discounts and gift card balances are applied where relevant. - Comprehensive order metrics (total, discounts, taxes, shipping status) are updated. - Fulfillment orders are generated for delivery processing. - Shipping information and charges are updated. - Prior abandoned checkouts may be closed as converted to order.

When a new PRODUCT is created: - Inventory item entries are generated. - Product variants (size, color, etc.) are created as needed. - Product is added to manual or smart collections according to rules. - Metafields are set for extra product information. - Pricing and tax details are established. - Shipping parameters (dimensions, weight) are specified. - Gift card products gain virtual inventory and balance tracking. - Tags assist categorization and filtering. - Product availability status (active/draft/archived) is set. - Discounts and price rules for eligibility are determined.

Bulk operations and customer metafields#

  • Use GraphQL for customer-related fields and metafields, leveraging bulk operations for high-volume extraction.

  • Bulk fetches are indexed by updated_at and limited to approximately two weeks of historical data.

  • For best efficiency, separate customer datastreams can be used: one for standard customer fields, another for customer metafields.

  • Monitor task progress; processing begins once the bulk job completes queuing.

Troubleshooting data completeness and typical issues#

  • To avoid missing customer updates, run customer data fetches after midnight, or fetch at least three days of data per run (as volume allows).

  • If fetching by created_at, fetching all historical data daily ensures completeness, but may not be feasible for stores with high volumes until improvements allow better selection by updated fields.

  • For full order visibility in Shipping and Fulfillment reports, include at least one fulfillments.line_items field in the configuration; not all orders have fulfillments, so some IDs may be absent in fulfillment-specific fields.

Shopify-specific limitations#

  • Only one GraphQL bulk operation is allowed at a time per shop.

  • Historical fetches for customer metafields are restricted to ~2 weeks.

  • Data from Discount Codes, Price Rules, and other areas may be available only via GraphQL in future.

  • Not all reports or resources are available in both REST and GraphQL APIs.

  • Collections vs. Smart Collections impact curation and automation of product groupings.