Skip to content

Supported data

Spotlight uses Apache Spark for data processing and is designed to work with a wide variety of data from both connected systems and uploaded files.

Data types#

Details about all data types discussed here can be found in the Apache Spark data types documentation.

Supported types#

Spotlight supports all Numeric, String, Binary, Boolean, and Datetime Spark data types. Columns with supported data can be viewed in Spotlight, annotated on the dataset detail page, and targeted by operations in the Workbench.

For convenience, all Numeric types are shown as DecimalType, which can represent numbers with arbitrary precision.

Complex types#

Columns with complex data types (ArrayType, MapType, and StructType) are not supported in Spotlight. Data in these types is stored in Spotlight and can be accessed via External Tools but will not be displayed inside Spotlight and cannot be targeted by operations.

The Manage Columns operation can be used to remove unsupported columns from a dataset.

Time type#

Excel supports TimeType columns that contain no date or time zone information (eg "10:00 am"). If you upload excel files with TimeType data in them, those columns will be converted into strings inside Spotlight.

Names#

Names in Spotlight can contain a wide range of characters so that you are not restricted in your naming conventions. In the rare instance where data has an unsupported file or column name, you will receive an error when attempting to upload or connect it to Spotlight.

File names#

Files being uploaded to Spotlight can have names containing any characters except the backtick (`) and cannot begin with either a space or an underline (_) character.

Column names#

A column name in Spotlight may contain any character except a ` (backtick). Note that column names inside the same dataset must be unique and are not case sensitive (e.g. column names "State" and "STATE" cannot both be used in a dataset).

Size#

Spotlight is capable of processing large volumes of data. Your local administrator can adjust the default configuration to match your organization's needs (see Cloud Admin: Guardrails, Upgrade and resize, and Performance).

Uploads#

Files up to 1Gb are supported by default. Larger CSV, JSON, or Parquet files can be stored on S3 and connected to Spotlight from there.

Dataset files with more than 512 columns will not upload to Spotlight.

Data size#

Datasets containing more than 3 million rows are not processed by Spotlight and will display with an error (warning triangle - yellow) next to their names.

Read more...

Datasets that exceed the record count limit will still appear in Spotlight search results but cannot be referenced in Workspaces or opened in external tools through Spotlight. These datasets will be flagged with an warning triangle - yellow alert on search cards and on their detail pages.

If a dataset grows beyond the record count limit, it will stop feeding data to any downstream datasets that have been built from it in Spotlight.