Introduction Of Oyster CDP (Customer Data Platform)
Oyster is an artificial intelligence/machine learning-powered analytic customer data platform (CDP) and digital marketing analytics platform offered by Express Analytics that provides intelligent prescriptions that go beyond traditional CDP to help brands attain exceptional returns on investment (RoI) — from profitable acquisition to predictable retention. Oyster is a single-point platform that integrates all business data across advertising, marketing, sales, commerce, and service channels.
Integration Architecture of Oyster CDP
Currently, Oyster is available on Amazon Web Services. Later versions of Oyster will be available on Microsoft Azure and Google Cloud Platform.
Source System Connectors:
Oyster has connectors to most source systems. The criteria to include a connector to a particular source system is the market share of the source system. The visitors’ browsing data from e-commerce sites or mobile apps is streamed in real-time, whereas the data from other sources come in on an hourly or daily basis. All the raw data from the sources are brought to the AWS S3 bucket.
Data Processing Engine:
Oyster ingests the raw data (both unstructured and structured) in AWS S3 and does multi-step processing before it loads data to AWS in a massively parallel columnar database. The data processing engine of Oyster is like a data refinery. At each step of the way the quality of the data improves significantly.
Oyster data processing primarily involves 6 steps:
- Data extraction
- Data cleansing
- Data enrichment
- Identity resolution
- Data transformation
- Data loading
Data Warehouse (AWS Redshift):
After the processing of data, the ingestion engine loads the data to AWS Redshift (columnar storage, massively parallel processing (MPP) database). This enables fast execution of the most complex queries operating on petabytes of data.
Oyster User Interface (UI):
Oyster UI is the front-end interface for Oyster users. Oyster UI primarily consists of the following modules:
- Customer 360: Customer 360 visualizes all of the multi-touchpoint data where the customer interacts with the brand – from customer profile to all of their browser data, their response to the promotions, and their interaction with customer services as well as social media behavior.
- Work boards and action center: Each workboard is a collection of metric leaderboards and gadgets. Each is designed to keep the particular department in focus. Workboards further allow for customization of gadgets and metrics. Oyster has pre-defined intuitive workboards created for each of the key departments of an enterprise, namely:
- Customer Support
- Product Management
- Supply Chain
- Information Technology
- Human Resources
- Campaign management: Through a diverse range of channels, content, and media, every marketing campaign targets specific audiences in an attempt to acquire new customers, and engage and retain existing customers. Oyster Campaign Management Screen allows planning, execution, and analysis of marketing campaigns, customer profiles, their potential touchpoint, and marketing content.
- Customer segmentation: When trying to reach customers with a marketing message or an ad campaign, targeting the right market with the right message is essential. If you aim too broadly, your message might reach a few people who end up becoming customers, but you’ll also reach a lot of people who aren’t interested in your products or services. When your messaging isn’t optimized for your audience, you’ll end up with a lot of wasted advertising dollars. Market segmentation can help an enterprise target only those most likely to become satisfied customers, or enthusiastic consumers of your content. To segment a market, you split it up into groups based on common characteristics. You can base a segment on one or more qualities. Slicing an audience in this way allows for more precisely targeted marketing and personalized content.
- Scoring: Oyster uses a number of predictive machine learning models to score each customer. Each model provides a different score for the customer profiles based on the buying behavior of each customer. The scores help the marketing team in prioritizing leads, achieve higher lead qualification rates, and reduce the time that it takes to qualify a lead, etc.
- Attribution: Oyster uses advanced deep learning models to attribute a conversion to a specific interaction by a specific channel. For online channels, there are two types of attribution models available. The standard rule-based models are available in the basic subscription to Oyster. For an advanced approach to multi-touch attribution (MTA), we use Markov Chain and Deep Learning MTA. These approaches are available in Oyster’s professional and enterprise versions. For offline channels such as TV advertising, direct marketing, snail mail, radio, etc. the marketing mix modeling (MMM) is available.
Stage File Types And Structure
Express Analytics accepts files in a flat-file format such as:
- Pipe (|)-delimited files
These files can be column-based with delimiters or in key-value format.
(Note: If exporting data from Microsoft Excel for Mac, choose the Windows Comma Separated (.csv) option. Do not use the MS-DOS or Macintosh CSV versions.)
Express Analytics strongly recommends compressing files before transferring them to Oyster, as this saves storage space, reduces transfer time, and bandwidth used by the client. This makes it simpler to detect when files are not completely transferred (incomplete files will fail to decompress).
Express Analytics recommends using these file compressions:
- gzip (.gz)
- 7z (.7z)
- ZIP (.zip)
All of these can be automatically decompressed once automated ingestion is set up. Oyster’s preferred compression technique is gzip. Currently, Oyster cannot accept password-protected compressed files – for additional security options see “Encryption” below. The original and compressed files should have the same name (minus different file extensions), i.e. filename. txt.gz when decompressed should result in filename.txt.
Oyster accepts multiple files archived together for transfer. That is, any .zip or .tar.gz file should, when decompressed, results in one file.
GPG/PGP-encrypted files using Express Analytics’ public key can be automatically decrypted when automated ingestion is set up. If necessary, an RSA version is available as well but is not supported for automated ingestion. If you require the use of the RSA key, please contact an Express Analytics representative.
If you would like to compress and encrypt your data, be sure to either:
- Use the built-in compression functionality of gpg.
- Compress the file, then encrypt it. This will produce a .gz.gpg file extension, for example.
Filenames should use only ASCII characters and should not contain whitespace characters or special characters such as !@#$%. Underscores “_” are permitted, but should not be the first character of the filename.
Oyster does not mandate a particular naming convention. Please do include these elements in order to make differentiating your data easier and prevent any chance of overwriting files:
- Type of data/description
- Date/timestamp indicating either when you created the file or the time range of the data
Regardless of whether your data is formatted as columnar or key-value, adhering to these general guidelines will produce the best results. For all data types, we recommend using UTF-8 encoding. UTF-8 is in wide adoption and ensures maximum compatibility across different systems.
- Put all data related to a given identifier on a single row. That is, if you have three different segments, put them all on one row either as three columns or three key-value pairs rather than listing the same identifier on three rows with one segment per row. The latter approach will make file processing take significantly longer.
- (Exception: multi-valued data. If you have a column where there are multiple values for the same header, each column value should be on a separate row, along with the identifier.)
- When in doubt, use double quotes. Data containing punctuation characters is at risk of delimiter collision and thus data bleed, where the delimiter chosen (such as a comma or pipe) also appears as part of the data values. This can cause us to interpret data in a particular row as belonging to the wrong field. To avoid this, enclose each field, key, and value in double quotes, with delimiters such as commas, pipes, and equals signs outside the quotes. Please ensure all quotes are closed with an even number of double quotes characters per data row. If using quotes, best practice (but not required) is to quote all fields rather than only those with delimiter collision potential. It is not necessary to quote empty/null fields. If a particular field value itself contains double quotes characters, they should be properly escaped and maintain the “even number of quotes” rule:
- LCD TV,50″ becomes “LCD TV”,”50″””
- “early-bird” special becomes “”early-bird” special”
- Do not use placeholders for empty values. If a given field for a particular row of data has no value, leave it blank. Do not use a placeholder such as “NULL” or “N/A”.
We at Express Analytics prefer and recommend that you provide column-based files, particularly when there is more than one identifier field (this will typically be offline/PII data with a name & postal, multiple email addresses, or some combination thereof).
Include a header row in every file. By including a header, we are able to flexibly accept columns in any order and can automatically detect and map much of the data without human intervention.
In a single file, every header must be a unique label. Files with two or more columns with the same header cannot be processed.
If you do not intend to use a particular data column, do not include it in the file. Each data column makes the file larger, and this size increases as a function of the number of rows in the file. Plus, we need to inspect that column for privacy compliance. The smaller the file and the fewer analysis operations Express Analytics needs to perform, the faster the file can be processed. If a given column will not be used as an identifier or as a segment to be distributed, do not include it.
Version 1 5 of 11
File Formatting Guide
Files must be rectangular. Every row should contain the same number of delimiters and fields. If a given field has no value for an identifier, simply leave that value empty for that row.
Express Analytics does not accept fixed-width files. Please use a delimited format.
Express Analytics accepts key-value files. These are best suited for single-field identifier files (this will typically be online data such as cookies or offline data tied only to a single email address or phone number).
When there is a single identifier, it should be the first field of each row and does not need to include a key:
<identifier>,k1=v1,k2=v2; not ID=<identifier>,k1=v1,k2=v2
When there are multiple identifiers, they should be in key-value format:
email=<email address>,phone=<phone number>,k1=v1,k2=v2
Separate keys and values by an equals sign (=). Please contact your Express Analytics representative if this will be a problem for you.
Separate the identifier and each key-value pair with a comma or similar delimiter. Delimiter options are the same as for columnar data.
Keys should be unique per row. That is, do not include something like <identifier>,k1=v1,k1=v2. In the case of multi-valued data, make a new row tied to the identifier for each value tied to the same key.
If using double quotes, do not enquote the equals sign or delimiters. Each key and value should be separately double-quoted:
“<identifier>”,”k1″=”v1″,”k2″=”v2″; not “<identifier>”,”k1=v1″,”k2=v2″.
If there is no value for a given key for an identifier, do not include that key in that row. That is, do not include something like “k1=”.
(Optional but recommended): Include every possible key in the first row. This allows us to quickly detect every key that might show up in the file and reduces the chance of error. This row can be all “dummy” data: a placeholder identifier and all keys set to equal “1”.
Express Analytics accepts both offline (personally-identifiable) and online (anonymous identifiers). Offline and online identifiers should never be in the same file.
(Note: Match data from our match partners excepted. Contact the Express Analytics team for details.)
File Formatting Guide
Files of the same type for ingestion in the same source should always have the same set of identifier fields: names should be consistent from file to file. For offline sources with multiple identifiers (for example, name and postal plus email), if a subsequent file contains data tied only to some subset of those identifiers, still include those headers and simply leave the row data blank.
Express Analytics can accept one or multiple pieces of offline identifiers in a file: email address, names and postal, telephone number, or any combination thereof. All offline identifiers for the same person should be put on the same row.
In order to maximize reach and maintain accuracy, the following 14 bold standardized fields are highly recommended for every file of offline identifiers you upload. If you cannot supply a given field of data, you may include the respective header anyway and simply leave that column blank. Headers must be included in the first line of every import, and should match the contents of the file.
In order, the standard fields are:
- Client Customer ID (should be unique and persistent)
- First Name
- Last Name
- Street Address 1
- Street Address 2
- Zip Code
- Zip Plus 4
- Phone Number1
- Phone Number2
all the way to…
The first field (“Client Customer ID”) should be unique and persistent across all audiences. This identifier will allow us to de-duplicate rows in the uploaded file, in case a file has multiple rows related to the same person. Please contact your support team if you are not using a Client Customer ID to identify records.