Skip to Main Content
Spotfire Ideas Portal
Status Future Consideration
Created by Guest
Created on Aug 1, 2018

Add a dict data type to StreamBase

Currently StreamBase offers following primitive field types:

  • bool
  • blob
  • int
  • long
  • double
  • timestamp
  • string

The primitive types can be used in higher order structures:

  • tuple
  • list

There are also 2 auxiliary structures:

  • function
  • capture group

The scope of this idea is key/value higher order structure. This would be similar to list, but the index would be string not int

Motivation:

With increased role of semi-design time model creation using loosely typed languages like Python or R (TERR) and flexible structure data, like datasets obtained with runtime defined SQL, the StreamBase tuple model is to rigid. There is a need to express bags with data addressed using labels that are unknown during design time. It is possible to achieve similar logic even now by defining list of tuples containing name and actual associated value. 

There are major problems related with this approach:

  • the cost of lookup is linear against list size
  • the coding effort to extract a value is unproportionally greater than the ubiquity of the use-cases needing this feature
  • it is not possible to guarantee uniqueness of the key
Proposal:

The idea is to use similar structure to Avro map:

https://avro.apache.org/docs/1.8.1/spec.html#Maps

This structure looks like list of key value pairs, but offers much better runtime performance. On the wire it is indeed list of entries, but in memory it is typically a hash table (in the specific language binding flavor, i.e. in Java HashMap, in Python dict).

The dictionary would have design time declared type of any valid SB field type.

Type name: dict

Type declaration in functions: dict(<value type>), example dict(string)dict((a string, b int))

Access to the data: use [] operator with string argument, for example features["age"]

In-place creation: dict(key1: value1, key2: value2[...]), example dict("age": 23, "count": 2)

Null dict declaration: dict(value), note no key, like dict(string()) or dict(0), the latter producing null of type dictionary of integer values

Basic functions:

  • length(dict) - number of entries in the dictionary
  • emptyDict(value) - creates an empty, non-null dictionary of values
  • mergeDict(dict1, dict2[...]) - creates a new dictionary with keys from the input dictionaries, last one in the list wins
  • updateDict(dict, key1: value1, key2: value2[...]) - creates a new dictionary with updated values for keys; can be used to remove keys
  • zipDict(keys, values) - creates a new dictionary with keys taken from list(string) keys and values from the other list of the same size
  • unzipDict(dict) - creates a tuple with fields keys (list(string)) and values (list(<dict value type>))

  • Attach files