Skip to content

Datameer Operations#

Filter#

Use this operation to filter the columns in your dataset based on their content.

The following applies:

  • filter operations can include a single filter condition or multiple conditions
  • each filter condition runs against a single column and requires you to select one of the available operators (Equals, Not Equals, Contains, Does Not Contain, Starts With, Does Not Start With, Ends With, Does Not End With, Is Empty, Is Not Empty, In, Not In) and provide a value to filter against
  • values are case sensitive and do not accept wildcards or regular expressions
  • the filter operation can also be added by right-clicking on a column in the data preview area and selecting "filter" from the context menu; in that case, the column will be preselected in the filter operation dialog
  • to add another condition, select a column from the list, this can be any column, even one you have previously used in this operation.
  • if you configure multiple filter conditions, the resulting dataset will contain rows that satisfy all those criteria (e.g. "criteria_1 AND criteria_2").

To apply the filter operation:

  1. Click on the "+" next to a Snowflake source or, if already performed a transformation, "+ Add to Recipe" and select "Filter". The operations configuration opens.

  2. Select the column you want to apply the filter on from the drop-down.

  3. Select the "Operator" from the drop-down, enter the filter "Value" and confirm with "Apply". The filter is displayed.

  4. (If needed, add another filter operation and) Confirm with "Apply". All configured filter operations apply. The column view is updated. The applied operation is listed in the 'Recipe' list

Examples

Using the following data:

Name Type
Tom horse
Ginger dog
Max cat
Kiki cat

A filter on the Type column for Not Equals and cat would result in a dataset containing only:

Name Type
Tom horse
Ginger dog

A filter on the Name column for Contains and i would result in a dataset containing only:

Name Type
Ginger dog
Kiki cat

Combining both filters would result in a dataset containing only:

Name Type
Ginger dog

Replace#

Use this basic search and replace operation to create a modified copy of an existing column.

The following applies:

  • search and replace is case sensitive but does not accept wild cards or regular expressions
  • the name for the newly created column can be set
  • by default this name will be the name of your source column with "_replace" added to the end.

If you need to search for a string containing spaces, escape them in the 'Current Value' field with a \. You do not need to escape spaces in your replacement text.

To search and replace:

  1. Click on the "+" next to a Snowflake source or, if already performed a transformation, "+ Add to Recipe" and select "Replace". The operations configuration opens.

  2. Select the "Source Column", enter the "Current value" and the "Replacement" value, and confirm with "Apply". The configuration applies. The column view is updated. The applied operation is listed in the 'Recipe' list

Example

Searching for "m" and replacing it with "M" would transform:

Title
Ms.
mr.
Dr.

into:

Title Title_replace
Ms. Ms.
mr. Mr.
Dr. Dr.

Split#

Use this operation to copy one column into two or more columns broken up by a pattern you specify.

The following applies:

  • the split operation can also be added by right-clicking on a column in the data preview area and selecting "Split" from the context menu
  • in that case, the column will be preselected in the split operation dialog
  • the pattern for your delimiter is case sensitive but does not accept wildcards
  • regular expressions are supported if you check the 'Regular Expression' box underneath the delimiter text entry box
  • you can enter a custom name for newly created columns or accept the default, which is the the source column name appended with '_split'; each new column name, whether custom or default, is sequentially appended with a number, from 1 to the column limit
  • if your data contains multiple instances of the delimiter, you may want to increase the column 'Limit', which controls how many columns will be created; the default 'Limit' is 2, which is also the minimum, any elements from your original column beyond the column 'Limit' you specify will be dropped; if you set a higher 'Limit' than there are occurrences of your delimiter in the data, empty columns will be created.

To split a column:

  1. Click on the "+" next to a Snowflake source or, if already performed a transformation, "+ Add to Recipe" and select "Split". The operations configuration opens.

  2. Select the "Source Column", enter the "Delimiter", enter the "Limit", and confirm with "Apply". The configuration applies. The column view is updated. The applied operation is listed in the 'Recipe' list

Example

Using this data and a new column name of "IP_Address_Octet":

Address
172.54.1.234
172.224.56.2

Splitting on the . delimiter with the default 'Limit' of 2 produces:

Address IP_Address_Octet1 IP_Address_Octet2
172.54.1.234 172 54
172.224.56.2 172 224

Splitting on the . delimiter with a 'Limit' of 5 produces:

Address IP_Address_Octet1 IP_Address_Octet2 IP_Address_Octet3 IP_Address_Octet4 IP_Address_Octet5
172.54.1.234 172 54 1 234
172.224.56.2 172 224 56 2

Extract#

Use this operation to create a new column from a fixed slice of characters in one of your existing columns.

The following applies:

  • the Start Index you enter is counted from the left and specifies how many characters to skip before starting the content of your new column, to start with the first character in your column, set the Start Index to 0
  • the Length you enter is the number of characters after the start index to copy into your new column
  • you can also set the name for the newly created column; by default this name will be the name of your source column with '_extract' added to the end

To extract:

  1. Click on the "+" next to a Snowflake source or, if already performed a transformation, "+ Add to Recipe" and select "Extract". The operations configuration opens.

  2. Select the "Source Column" from the drop-down, enter the "Start Index" and "Length", and confirm with "Apply". The configuration applies. The column view is updated. The applied operation is listed in the 'Recipe' list

Example

Specifying a Start Index of 4, a Length of 2, and a new column name of "Month" would transform:

Date
20191208
20111130

into

Date Month
20191208 12
20111130 11

Blend#

Use this operation to combine with the contents of a second dataset based on shared content in one or more columns.

The following applies:

  • use the 'Sources' section to select another dataset and configure how the data from the two will be combined.
  • by default Datameer uses an 'Outer Left' blend mode which will include all information from your starting dataset and any content from the second dataset that has an entry in the column(s) you select in the 'Columns' section - more info on these options is available at W3Schools
  • once you configure the second dataset Datameer will suggest column options underneath the 'Flow' area and provide previews of the resulting combinations
  • the 'Columns' section lets you choose which columns between the two datasets overlap and should be used in combining the two datasets
  • you can select one of the column pairs Datameer suggests or configure columns manually by selecting names from the drop down menus and clicking the small "+" button to the right of these drop-down menus
  • add multiple column pairings to refine your results; only rows with content that overlaps in all your specified column pairs will be used in combining the two datasets
  • note that when system resources are being heavily used, Datameer may occasionally be unable to generate a preview of an operation

To blend:

  1. Click on the "+" next to a Snowflake source or, if already performed a transformation, "+ Add to Recipe" and select "Blend". The operations configuration opens.

  2. Select the second "Source" and the "Blend Mode". The columns view adapts and suggested columns for blending are displayed.

  3. Select the columns: Click on the "+ Use Suggested Columns" button from the recommendations field or select the "Column" from the drop-down and confirm with "Create Blend". The configuration applies. The column view is updated. The applied operation is listed in the 'Recipe' list

Aggregate#

Use this operation to summarize your data by aggregating over one or more dimensions. Furthermore you can also add measures to your aggregations or display the total count.

The following applies:

  • the order of group bys and measures can be rearranged
  • the 'Count' operation is optional
  • the 'Measures' operation is optional

To aggregate:

  1. Click on the "+" next to a Snowflake source or, if already performed a transformation, "+ Add to Recipe" and select "Aggregate". The operations configuration opens.

  2. Click on the "+" next to 'Group Bys' field. The 'Add Group Bys' dialog opens.

  3. Select at least one of the columns you want to group by and confirm with "Apply".

  4. Click on the "+" next to the 'Measures' field. The 'Add Measures' dialog opens.

  5. Select at least one of the columns you want to measure and confirm with "Apply".

  6. Select the aggregation function you want to apply to the measurement from the drop down.

  7. (Optional) Click on the 'Count' icon to count globally based on the combined resulting row groups.

  8. Confirm with "Create". The operation is proceeded. The new node appears in the Flow Area. The operation stack contains the proceeded light data preparation operation.

Manage Columns#

Use this operation to control which columns are included in the dataset.

The following applies:

  • checked names are included, un-checked one are excluded
  • all columns are checked by default
  • the 'Find columns' search box at the top of the dialog filters the list of visible columns so you can quickly locate columns in large datasets
  • un-check the 'Select all' box to uncheck all columns
  • checking 'Select all' will re-select all columns

To manage columns:

  1. Click on the "+" next to a Snowflake source or, if already performed a transformation, "+ Add to Recipe" and select "Manage Columns". The operations configuration opens.

  2. Select the columns, you want to include in the dataset and confirm with "Apply". The configuration applies. The column view is updated. The applied operation is listed in the 'Recipe' list