Details about all data types discussed here can be found in the Apache Spark data types documentation.
Spotlight supports all Numeric, String, Binary, Boolean, and Datetime Spark data types. Columns with supported data can be viewed in Spotlight, annotated on the dataset detail page, and targeted by operations in the Workbench.
For convenience, all Numeric types are shown as
DecimalType, which can represent numbers with arbitrary precision.
Columns with complex data types (
StructType) are not supported in Spotlight. Data in these types is stored in Spotlight and can be accessed via External Tools but will not be displayed inside Spotlight and cannot be targeted by operations.
The Manage Columns operation can be used to remove unsupported columns from a dataset.
TimeType columns that contain no date or time zone information (eg "10:00 am"). If you upload excel files with
TimeType data in them, those columns will be converted into strings inside Spotlight.
Names in Spotlight can contain a wide range of characters so that you are not restricted in your naming conventions. In the rare instance where data has an unsupported file or column name, you will receive an error when attempting to upload or connect it to Spotlight.
Files being uploaded to Spotlight can have names containing any characters except the backtick (`) and cannot begin with either a space or an underline (_) character.
A column name in Spotlight may contain any character except a ` (backtick). Note that column names inside the same dataset must be unique and are not case sensitive (e.g. column names "
State" and "
STATE" cannot both be used in a dataset).
Spotlight is capable of processing large volumes of data. Your local administrator can adjust the default configuration to match your organization's needs (see Cloud Admin: Guardrails, Upgrade and resize, and Performance).
Files up to 1Gb are supported by default. Larger CSV, JSON, or Parquet files can be stored on S3 and connected to Spotlight from there.
Dataset files with more than 512 columns will not upload to Spotlight.
Datasets containing more than 3 million rows are not processed by Spotlight and will display with an error () next to their names.
Datasets that exceed the record count limit will still appear in Spotlight search results but cannot be referenced in Workspaces or opened in external tools through Spotlight. These datasets will be flagged with an alert on search cards and on their detail pages.
If a dataset grows beyond the record count limit, it will stop feeding data to any downstream datasets that have been built from it in Spotlight.