TechnologyOctober 21, 2016

A Gremlin Implementation of the Gremlin Traversal Machine

A Gremlin Implementation of the Gremlin Traversal Machine

It's Halloween and Gremlin and his machine friends are planning on cruising The TinkerPop in search of tasty treats. For costumes this year, the crew decided to dress up as one another: Gremlin is going as Pipes, Pipes as Blueprints, Blueprints as Gremlin, Frames as Rexster, Rexster as Frames, and Furnace, well...Furnace didn't get the memo. During the day, the gang ran around gathering materials for their costumes and planning their night's path(). While a warm sense of joy engulfed the excited lot, unbeknownst to Gremlin, this Halloween night will leave a permanent, lasting scar in his mind. Gremlin thinks he will be getting treats, but by nights end, he will learn one of the most dastardly tricks of The TinkerPop and suffer its logical entailment for all eternity. Let's tune in and watch their Halloween adventure unfold().

 

Traversals: Graph-Encoded Traversals

Assume the graph diagrammed below. This toy graph is distributed with Apache TinkerPop™ in various graph serialization formats: GraphMLGraphSON, and Gryo. In this toy graph, there are person- and software-vertices, where a person knows another person and people have created software.

 

IMPORTANT: All of the code snippets were run with Apache TinkerPop 3.2.3 and build on each other over the course of the article.

 

Gremlin is a graph traversal language used to query/analyze/process graph data. It can be used with both OLTP/real-time graph databases and OLAP/analytic graph processors. Gremlin traversals are written in the native programming language of the user (Gremlin is a "hosted language"). While Gremlin can be written in any programming language, ultimately, traversals are translated to a language agnostic format called Gremlin bytecode. Gremlin bytecode enables any Gremlin language variant to interact with a Gremlin traversal machine. Apache TinkerPop provides a JVM-based implementation of the Gremlin traversal machine. Every Gremlin traversal machine is responsible for compiling, optimizing, and evaluating a traversal against any TinkerPop-enabled graph system (OLTP or OLAP). The stages of a traversal's life, from user to graph, are diagrammed below.

A collection of basic Gremlin traversals is provided below. These traversals are written in Gremlin-Python. Gremlin-Python is a Gremlin language variant distributed by Apache TinkerPop. Note that the toy graph previously diagrammed has already been loaded and is currently the only data in the graph (bin/gremlin-server.sh conf/gremlin-server-modern-py.yaml).

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

~ python

Python 2.7.10 (default, Oct 23 2015, 19:19:21)

[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

>>> from gremlin_python import statics

>>> from gremlin_python.structure.graph import Graph

>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

>>> statics.load_statics(globals())

>>>

>>> g = Graph().traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin','g'))

>>>

>>> g.V().has('name','marko').out('knows').count().next()

2L

>>>

>>> g.V().has('name','marko'). \

...   out('created').in_('created'). \

...   has('name',neq('marko')).dedup().name.toList()

[u'josh', u'peter']

>>>

>>> g.V().has('name','lop').in_('created'). \

...   out('knows').dedup().age.mean().next()

29.5

>>>

>>> g.V().hasLabel('person'). \

...   repeat(both('knows')).times(5). \

...   groupCount().by('name').next()

{u'vadas': 4L, u'marko': 8L, u'josh': 4L}

>>>

NOTE: While in() is a step in Gremlin, in is a reserved term in Python. Therefore, Gremlin-Python uses a _-postfix for all reserved terms, where in() is written in_(). Other reserved Python terms include andorasisnot, etc.

 

  • Line 1: Open the CPython console.
  • Line 5-8: Import Gremlin-Python classes and load "statics" (e.g. __.out() can be written out()).
  • Line 10: Create a WebSockets connection to a Gremlin traversal machine at localhost:8182.
  • Line 12: How many people does Marko know?
  • Line 15-17: What are the names of Marko's collaborators (i.e. co-creators)?
  • Line 20-21: What is the average age of the friends of the people who created LinkedProcess?
  • Line 24-26: Rank all people by how central they are in the knows-subgraph.

As previously mentioned, this Halloween Gremlin is dressing up as Pipes. His costume is a traversal that encodes a traversal as a set of vertices and edges -- forming an explicit pipeline process in the graph structure. Gremlin decided to first make a simple version of his traversal costume before generalizing it to support the graph-encoding of any arbitrary traversal. His first design below can process any linear, non-nested traversal. The code below uses Gremlin Console and thus, Gremlin-Groovy.

NOTE: Both Gremlin-Groovy and Gremlin-Python are communicating with the same Gremlin traversal machine over a WebSockets connection which was configured when defining the GraphTraversalSource g using the withRemote() source step.

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

$ bin/gremlin.sh

 

         \,,,/

         (o o)

-----oOOo-(3)-oOOo-----

plugin activated: tinkerpop.server

plugin activated: tinkerpop.utilities

plugin activated: tinkerpop.tinkergraph

gremlin> import static org.apache.tinkerpop.gremlin.util.function.Lambda.function

gremlin> import static org.apache.tinkerpop.gremlin.util.function.Lambda.predicate

gremlin> import static org.apache.tinkerpop.gremlin.util.function.Lambda.consumer

gremlin> g = EmptyGraph.instance().traversal().withRemote(DriverRemoteConnection.using(Cluster.build('localhost').port(8182).create(),'g'))

==>graphtraversalsource[emptygraph[empty], standard]

gremlin>

gremlin> traversal = g.V().has('name','marko').out('knows').values('name'); []

gremlin>

gremlin> g.inject(traversal).

           map(function('it.get().bytecode')).as('bytecode').

           addV('traversal').as('traversal').property('source','g').

           select('bytecode').map(function('it.get().stepInstructions')).unfold().

           project('op','args').as('inst').

             by(function('it.operator')).

             by(function('it.arguments as List')).

           addV('step').as('step').

             property('op', select('op')).

             sideEffect(select('inst').select('args').unfold().as('arg').

                        select('step').property(list,'arg',select(last,'arg'))).

           choose(select('traversal').out('next'),

             addE('next').from(V().where(hasLabel('step').and().not(outE('next'))).where(neq('step'))).to(select('step')),              

             addE('next').from(select('traversal')).to(select('step'))).iterate()

  • Line 1: Open up the Gremlin Console which provides REPL support for Gremlin-Groovy.
  • Line 9-11: Import static Lambda methods to make the subsequent traversals more compact.
  • Line 12: Create a remote connection to the Gremlin traversal machine. This is the same machine accessed previously in the Gremlin-Python example.
  • Line 15: Define a traversal that will be encoded into the graph. The [] is a trick that prevents the Gremlin Console from iterating (i.e. compiling and evaluating) the traversal.
  • Line 17: Put the defined traversal into the costume traversal stream.
  • Line 18: Get the bytecode representation of the traversal.
  • Line 19: Create a traversal-vertex that will represent the source of the traversal.
  • Line 20: Go back to the bytecode and stream out its step instructions. For the sake of simplicity, this article will ignore source instructions.
  • Line 21-23: Project out the operator and arguments of each instruction in the bytecode.
  • Line 24: Each instruction is represented by a step-vertex.
  • Line 25: Each step-vertex has an op-property denoting the step function it represents.
  • Line 26-27: Each argument of the instruction is an arg multi-property on the step-vertex.
  • Line 28-30: The first step-vertex is attached to the traversal-vertex. All others step-vertices connect to the previous step-vertex.

The Gremlin traversal below is converted into the diagrammed graph structure by the aforementioned Gremlin costume traversal. In this way, a traversal is used to traverse a traversal in order to encode the traversed traversal into the graph. Once the traversal is represented in the graph, it can be traversed by traversals!

g.V().has('name','marko').out('knows').values('name')

The graphic above details the vertex/edge-structure of the traversal's graph representation. This structure can also be realized via a traversal that starts at the newly created traversal-vertex and walks next-edges until no more step-vertices can be reached. Then for each vertex touched, the source-, op-, and arg-properties are projected out of the path.

gremlin> g.V().has('source','g').

           repeat(out('next')).until(out('next').count().is(0)).

           path().by(valueMap('source','op','arg'))

==>[[source:[g]],[op:[V]],[op:[has],arg:[name,eq(marko)]],[op:[out],arg:[knows]],[op:[values],arg:[name]]]

Gremlin was impressed with his initial costume design. By representing a traversal in the graph, Gremlin was able to interact with it. However, before doing anything more complex with this graph-embedded traversal, Gremlin decides to generalize his initial design in order to support any arbitrary nested traversal.

NOTE: Many steps in Gremlin are traversal parents in that they take traversals as arguments. For instance, in repeat(out('created').in('created')), the repeat()-step takes the anonymous out('created').in('created') traversal as an argument. Topologically, Gremlin traversals are linear chains of steps with (potentially) nested, acyclic traversal trees.

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

gremlin> g.V().hasLabel(within('traversal','step')).drop()

gremlin>

gremlin> traversal = g.V().has('name','marko').

                       repeat(outE('knows','created').identity().inV()).times(2).

                       values('name').groupCount(); []

gremlin>

gremlin> g.inject(traversal).

           map(function('it.get().bytecode')).as('bytecode').

           addV('traversal').

             property('source','g').

             property('bytecode',select('bytecode')).

             property('depth',0L).

           repeat(V().hasLabel('traversal').has('bytecode').

             has('depth',loops()).as('traversal').

             values('bytecode').map(function('it.get().stepInstructions')).unfold().

             project('op','args').as('inst').

               by(function('it.operator')).

               by(function('it.arguments as List')).

             addV('step').as('step').

               property('op', select(last,'inst').select('op')).

               sideEffect(addE('parent').to(select(last,'traversal')).

               select(last,'inst').select('args').unfold().as('arg').

               choose(predicate('it instanceof Bytecode'),

                 addV('traversal').

                   property('source','__').

                   property('bytecode',select(last,'arg')).      

                   property('depth', union(loops(), constant(1)).sum()).

                   addE('child').from(select(last,'step')),

                 select(last,'step').property(list,'arg',select(last,'arg')))).

           choose(select(last,'traversal').not(out('next')),

             addE('next').from(select(last,'traversal')).to(select(last,'step')),

             addE('next').from(select('prev').tail(local,1)).to(select(last,'step'))).

           select(last,'step').store('prev').

           sideEffect(select(last,'traversal').properties('bytecode','depth').drop()))

  • Line 1: Delete the previously graph-encoded traversal by dropping all traversal- and step-vertices.
  • Line 3-5: Define a traversal to be encoded. The [] is added to prevent the Gremlin Console from iterating the traversal.
  • Line 7: Put the defined traversal into the traversal stream.
  • Line 8: Convert the traversal to its bytecode representation.
  • Line 9-12: Create a traversal-vertex that will represent the source of the traversal.
  • Line 13: Loop while there are still traversal-vertices with a bytecode-property.
  • Line 14: Only look at those traversals at the current loop/nest depth.
  • Line 15: Stream out the instructions of the bytecode of the current traversal-vertex.
  • Line 16-18: Project out the operator and arguments for each instruction in the bytecode.
  • Line 19: Create a step-vertex for each instruction of the bytecode.
  • Line 20: The op-property of the step-vertex is the name of the step's function.
  • Line 21: Every step points to its parent traversal-vertex. This edge makes things a bit easier later on.
  • Line 22: Stream out the arguments of the current instruction.
  • Line 23-29: If the argument is bytecode, create a child traversal-vertex. Else, add arg multi-properties.
  • Line 30-32: If the current step-vertex is the first instruction, attach it to the parent traversal-vertex, else to the previous step-vertex.
  • Line 33: Store the current step-vertex in a global stack for access by the next instruction.
  • Line 34: Once the traversal-vertex has been processed, drop its bytecode- and depth-properties signaling that it has already been encoded in the graph.

The more complex traversal-to-graph translator above is able to process nested traversals. Thus, the traversal below, with a repeat()-step has the subsequent diagrammed graph representation.

g.V().has('name','marko').

  repeat(outE('knows','created').identity().inV()).times(2).

  values('name').groupCount()

Bytecode-Graph #2

It is possible to "view" the traversal's graph structure using a path()-based traversal. The Gremlin traversal below orders all traversals by their "depth" (i.e. how many child steps they are under). It then walks each traversal from the traversal-vertex to its final step-vertex projecting out the respective vertex properties along the path. Note that this traversal also shows the parent of the traversal source so its easy to see under which step a particular __-based traversal is contained.

gremlin> g.V().has('source').as('source').

           order().by(until(__.in('child').count().is(0)).

                      repeat(__.in('child').out('parent')).path().count(local),incr).

           choose(__.in('child'),

                  __.in('child').map(select('source').out('next')),

                  out('next')).

           until(out('next').count().is(0)).repeat(out('next')).

           path().by(valueMap('source','op','arg'))

==>[[source:[g]],[op:[V]],[op:[has],arg:[name,eq(marko)]],[op:[repeat]],[op:[times],arg:[2]],[op:[values],arg:[name]],[op:[groupCount]]]

==>[[source:[__]],[op:[repeat]],[op:[outE],arg:[knows,created]],[op:[identity]],[op:[inV]]]

It is important to note that both the "graph" and the "traversal" are in the same graph. This is analogous, in many respects, to the memory of a physical machine being used to store both "programs" and "data." The traversal below makes salient that both the original person/software-graph and the traversal-graph are co-located.

gremlin> g.V().group().

                 by(label).

                 by(valueMap('name','source','op').fold()).next()

==>software=[{name=[lop]}, {name=[ripple]}]

==>person=[{name=[marko]}, {name=[vadas]}, {name=[josh]}, {name=[peter]}]

==>step=[{op=[groupCount]}, {op=[V]}, {op=[outE]}, {op=[has]}, {op=[identity]}, {op=[repeat]}, {op=[inV]}, {op=[times]}, {op=[values]}]

==>traversal=[{source=[__]}, {source=[g]}]

Understanding Gremlin's Group Step
Gremlin's group()-step takes one "key"-modulator and one "value"-modulator (see by()-step). The "key"-modulator determines which aspect of the incoming vertex should be used as the key to group the vertex. The "value"-modulator determines which aspect of the incoming vertex should be stored as the value in the respective key-group. If the "value"-modulator uses a reducing barrier step, then group() effects a reduce. For instance, groupCount().by('name') is equivalent to group().by('name').by(count()).

 

Traversal Strategies: Traversals that Traverse Traversals

The night was still young and the crew had a few more of hours before their planned departure. Gremlin decided to use this time to further extend his costume. He thought to himself -- "Well hell, if I can represent a traversal as a graph, then I can traverse that traversal in order to optimize it. I can be my own compiler!" (analogously: javac).

A Gremlin traversal machine has a collection of traversal strategies. Some of these traversal strategies are specific to Gremlin (e.g. optimization strategies) and some are specific to the underlying graph system (e.g. provider optimization strategies). Gremlin-specific traversal strategies rewrite a traversal into a semantically-equivalent, though (typically) more optimal form. Similarly, provider-specific strategies mutate a traversal so as to leverage vendor-specific features such as graph-centric indices, vertex-centric indices, push-down predicates, batch-retrieval, schema validation, etc. Now that a traversal is represented in the graph as vertices and edges, Gremlin can traverse it and thus, rewrite it.

NOTE: The current default traversal strategies distributed with Apache TinkerPop's Gremlin traversal machine implementation includes: ConnectiveStrategyIdentityRemovalStrategyInlineFilterStrategyMatchPredicateStrategyRepeatUnrollStrategyPathRetractionStrategyIncidentToAdjacentStrategyAdjacentToIncidentStrategyFilterRankingStrategyRangeByIsCountStrategyLazyBarrierStrategyProfileStrategyStandardVerificationStrategy.

 

IdentityRemovalStrategy

A simple traversal strategy is the IdentityRemovalStrategy. This strategy removes all identity()-steps because they are "no ops," providing nothing to the computation save wasting clock-cycles evaluating a one-to-one identity mapping. The following traversal strategy traversal finds all identity-vertices and creates a next-edge from its previous step (or source) to its subsequent step. After doing so, the identity-vertex is deleted. Thus, the traversal below is a Gremlin-based implementation of the IdentityRemovalStrategy.

g.V().has('op','identity').as('step').

  addE('next').

    from(select('step').in('next')).

    to(select('step').out('next')).

  select('step').drop()

IMPORTANT: This Gremlin-based implementation of IdentityRemovalStrategy does not account for the fact that an identity-step may be the last step-vertex in the traversal chain. This lapse was intended in order to make the traversal as clear as possible. Subsequent strategies will account for such boundary conditions.

 

When the above IdentityRemovalStrategy traversal is applied to the graph encoded traversal, the graph is mutated as diagrammed and displayed below. In short, the outE('knows','created').identity().inV() fragment is rewritten as outE('knows','created').inV().

gremlin> g.V().has('source').as('source').

           order().by(until(__.in('child').count().is(0)).

                      repeat(__.in('child').out('parent')).path().count(local),incr).

           choose(__.in('child'),

                  __.in('child').map(select('source').out('next')),

                  out('next')).

           until(out('next').count().is(0)).repeat(out('next')).

           path().by(valueMap('source','op','arg'))

==>[[source:[g]],[op:[V]],[op:[has],arg:[name,eq(marko)]],[op:[repeat]],[op:[times],arg:[2]],[op:[values],arg:[name]],[op:[groupCount]]]

==>[[source:[__]],[op:[repeat]],[op:[outE],arg:[knows,created]],[op:[inV]]]

IncidentToAdjacentStrategy

IncidentToAdjacentStrategy looks for incident patterns such as outE().inV()inE().outV(), and bothE().otherV(). It rewrites such fragments into the semantically equivalent form of out()in(), and both(), respectively. Thus, instead of fetching and materializing incident edges and then fetching and materializing the respective adjacent vertices, the optimized traversal skips accessing the edges and jumps directly to the adjacent vertices. The traversal is a Gremlin-implementation of IncidentToAdjacentStrategy. It locates all outE().inV() patterns in the graph encoded traversal and then rewrites the identified subgraph to out(). It is possible to generalize this traversal to support inE().outV() and bothE().otherV(), but for the sake of simplicity, only the outE().inV()-pattern is optimized.

g.V().match(

  __.as('a').out('next').as('b'),

  __.as('a').has('op','outE'),

  __.as('b').has('op','inV')).

addV('step').as('c').

  property('op','out').

  sideEffect(select('a').values('arg').as('arg').

             select('c').property(list,'arg',select('arg'))).

  addE('next').from(select('a').in('next')).to('c').

  choose(select('b').out('next').count().is(gt(0)),

    addE('next').from('c').to(select('b').out('next')),

    identity()).

select('a','b').select(values).unfold().drop()

Understanding Gremlin's Match Step
The previous traversal uses Gremlin's match()-step. This step provides a SPARQL/Prolog-ish approach to graph traversing. Instead of explicitly defining the traverser flow, match()-step maintains a collection of (potentially nested) patterns. It is up to match()-step to choose which pattern to execute next for the current traverser. The selection criteria is predicated on 1.) the requisite data being available for that pattern and 2.) a bias towards those patterns that have historically yielded the least amount of data (i.e. try and filter first). The latter point demonstrates that the Gremlin traversal machine also supports runtime strategies which, unlike compile-time strategies, mutate the traversal as the traversal is executing.

 

When the traversal above is applied to the running example graph, the graph is rewritten as diagrammed below.

gremlin> g.V().has('source').as('source').

           order().by(until(__.in('child').count().is(0)).

                      repeat(__.in('child').out('parent')).path().count(local),incr).

           choose(__.in('child'),

                  __.in('child').map(select('source').out('next')),

                  out('next')).

           until(out('next').count().is(0)).repeat(out('next')).

           path().by(valueMap('source','op','arg'))

==>[[source:[g]],[op:[V]],[op:[has],arg:[name,eq(marko)]],[op:[repeat]],[op:[times],arg:[2]],[op:[values],arg:[name]],[op:[groupCount]]]

==>[[source:[__]],[op:[repeat]],[op:[out],arg:[knows,created]]]

LazyBarrierStrategy

LazyBarrierStrategy is perhaps the most important traversal strategy in Gremlin as, in certain situations, it is able to turn traversals that would take the lifetime of the universe to compute into sub-millisecond rippers (you know, killin' it). Graph traversing is all about path analysis -- moving about the graph analyzing its structure/topology. LazyBarrierStrategy shines when a traversal yields multiple intersecting/overlapping/converging/reverberating paths. A great example of such situations is the classic collaborative-filtering recommendation algorithm. A graph-based collaborative-filtering algorithm proceeds as follow. Suppose a bi-partite graph composed of people and products where people like products. With this structure, it is possible to recommend products to a person. First, traverse to the products that the person likes, then traverse to those people who also like those products, then traverse to the products those people like that are not already liked by the original person, and then count the number of traversers at each product in order to yield a product ranking -- i.e. a recommendation. Given that the liked products of the source person will be liked by many of the same people, traversers paths will overlap. Barrier! Next, given that the people that like those products will probably like some of the same products, paths will overlap again. Barrier! In general, LazyBarrierStrategy looks for one-to-many mappings (called flatMaps) and "stalls" the traverser flow by creating a barrier() which can group many traversers at the same vertices into a single "bulked" traverser. In essence, instead of calculating out('likes') on 1000 traversers at the same person, calculate it once on a traverser with a bulk of 1000. For more information, please see A Note on Barrier Steps.

Understanding Gremlin's LazyBarrierStrategy in the Context of Collaborative Filtering
Apache TinkerPop distributes with a Grateful Dead graph dataset. This graph contains songs and artists, where songs follow each other in concert and artists write and/or sing songs. Let's assume we want to recommend a new song for Dark Star to follow.

 

gremlin> graph = TinkerGraph.open()

==>tinkergraph[vertices:0 edges:0]

gremlin> graph.io(graphml()).readGraph('data/grateful-dead.xml')

 

/////////////////////////////////

// Without LazyBarrierStrategy //

/////////////////////////////////

 

gremlin> g = graph.traversal().withoutStrategies(LazyBarrierStrategy)

==>graphtraversalsource[tinkergraph[vertices:808 edges:8049], standard]

gremlin> clockWithResult(100){

           g.V().has('name','DARK STAR').

             out('followedBy').aggregate('likes').

             in('followedBy').out('followedBy').

               where(not(within('likes'))).

             groupCount().by('name').

             order(local).by(values,decr).

               unfold().limit(10).toList() }

==>19.0969239

==>[CHINA CAT SUNFLOWER=788,SAMSON AND DELILAH=778,UNCLE JOHNS BAND=750,SCARLET BEGONIAS=747,NEW MINGLEWOOD BLUES=726,

    ESTIMATED PROPHET=709,THE MUSIC NEVER STOPPED=699,LOOKS LIKE RAIN=697,BEAT IT ON DOWN THE LINE=661,RAMBLE ON ROSE=656]

 

//////////////////////////////

// With LazyBarrierStrategy //

//////////////////////////////

 

gremlin> g = graph.traversal()                                      

==>graphtraversalsource[tinkergraph[vertices:808 edges:8049], standard]

gremlin> clockWithResult(100){

           g.V().has('name','DARK STAR').

             out('followedBy').aggregate('likes').

             in('followedBy').out('followedBy').

               where(not(within('likes'))).

             groupCount().by('name').

             order(local).by(values,decr).

               unfold().limit(10).toList() }

==>1.4836289499999997

==>[CHINA CAT SUNFLOWER=788,SAMSON AND DELILAH=778,UNCLE JOHNS BAND=750,SCARLET BEGONIAS=747,NEW MINGLEWOOD BLUES=726,

    ESTIMATED PROPHET=709,THE MUSIC NEVER STOPPED=699,LOOKS LIKE RAIN=697,BEAT IT ON DOWN THE LINE=661,RAMBLE ON ROSE=656]

 

///////////////////////////////

// Using Traversal.explain() //

///////////////////////////////

 

==>Traversal Explanation

===========================================================================================================

Original Traversal                 [GraphStep(vertex,[]), HasStep([name.eq(DARK STAR)]),

                                    VertexStep(OUT[followedBy],vertex), AggregateStep(likes),

                                    VertexStep(IN,[followedBy],vertex), VertexStep(OUT[followedBy],vertex),

                                    WherePredicateStep(without([likes])), GroupCountStep(value(name)),

                                    OrderLocalStep([[values,decr]]), UnfoldStep, RangeGlobalStep(0,10)]

...

Final Traversal                    [TinkerGraphStep(vertex,[name.eq(DARK STAR)]),

                                    VertexStep(OUT[followedBy],vertex), AggregateStep(likes),

                                    VertexStep(IN,[followedBy],vertex), NoOpBarrierStep(2500),

                                    VertexStep(OUT,[followedBy],vertex), NoOpBarrierStep(2500),

                                    WherePredicateStep(without([likes])),

                                    GroupCountStep(value(name)), OrderLocalStep([[values, decr]]),

                                    UnfoldStep, RangeGlobalStep(0,10)]

Gremlin's explain()-step shows the inserted NoOpBarrierSteps. The steps displayed by explain() are not Gremlin bytecode steps, but Gremlin traversal machine-specific steps. In analogy to the JVM atop an Intel processor, the above steps are "machine code," specific to Apache TinkerPop's Gremlin traversal machine instruction set. Finally, note how LazyBarrierStrategy was able to reduce a 19 millisecond runtime down to a 1.5 millisecond runtime. That is a 10x+ improvement.

 

The traversal below is a Gremlin-based implementation of LazyBarrierStrategy. It inserts a barrier-vertex after every "flatMap"-step (i.e. a one-to-many step). Note the sideEffect()-step below. If the "flatMap"-step is not the end step, then the inserted barrier-vertex extends a next-edge to the "right adjacent"-step.

g.V().as('flatMap').

  has('op',within('out','in','both','values')).

  where(out('next').has('op',neq('barrier')).or().not(out('next'))).

  addV('step').as('barrier').

    property('op','barrier').

    property('arg','2500').

  sideEffect(select('flatMap').outE('next').as('a').

       addE('next').from('barrier').to(select('a').inV()).

       select('a').drop()).

  select('flatMap').addE('next').from('flatMap').to('barrier')

The traversal strategy traversal above optimizes the graph-encoded traversal as diagrammed below.

gremlin> g.V().has('source').as('source').

           order().by(until(__.in('child').count().is(0)).

                      repeat(__.in('child').out('parent')).path().count(local),incr).

           choose(__.in('child'),

                  __.in('child').map(select('source').out('next')),

                  out('next')).

           until(out('next').count().is(0)).repeat(out('next')).

           path().by(valueMap('source','op','arg'))

==>[[source:[g]],[op:[V]],[op:[has],arg:[name,eq(marko)]],[op:[repeat]],[op:[times],arg:[2]],[op:[values],arg:[name]],[op:[barrier],arg:[2500]],[op:[groupCount]]]

==>[[source:[__]],[op:[repeat]],[op:[out],arg:[knows,created]],[op:[barrier],arg:[2500]]]

In sum total, after all the aforementioned traversal-based traversal strategies are applied,

g.V().has('name','marko').

  repeat(outE('knows','created').identity().inV()).times(2).

  values('name').groupCount()

is compiled to

g.V().has('name','marko').

  repeat(out('knows','created').barrier(2500)).times(2).

  values('name').barrier(2500).groupCount()

Traversal Execution: A Traversal that Evaluates a Traversal

Gremlin was obsessed with his costume. He had sewn it into something beyond "being Pipes" and he wanted more! He decided to skip going out trick-or-treating and instead, spend the evening taking his costume to the next level. His friends were disappointed, though Gremlin was quick to brush off any feelings of guilt -- he was beyond "having fun with friends." He was on his way to seeing the true nature of The TinkerPop.

If Gremlin could both represent and optimize a traversal, why couldn't he also evaluate a traversal. Gremlin thought "I can be my own traversal machine. I won't need The TinkerPop anymore. I will be The TinkerPop! Boo ha ha ha." (analogously: java). With the winds of arrogance against the sails of life, Gremlin added the final amendment to his costume -- a traversal machine traversal. Gremlin is able to simulate himself because the Gremlin language is Turing Complete. This means that it can compute any algorithmic process for which the Gremlin traversal machine is one such algorithm. Please see Section 6 of the The Gremlin Graph Traversal Machine and Language for a formal proof of Gremlin's universal expressivity.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

g.withSack(0).withSideEffect('drain',[1]).withSideEffect('fill',[]).

  V().has('source','g').as('parent').

  repeat(out('next').as('step').

    map(values('arg').fold()).as('args').

    sideEffect(select('drain').unfold().

    choose(select(last,'args').count(local).is(0),

      choose(select(last,'step').by('op')).

        option('V', V().hasLabel(not(within('step','traversal')))).

        option('out',out()).

        option('in',__.in()).

        option('both',both()).

        option('outE',outE()).

        option('inE',inE()).

        option('bothE',bothE()).

        option('inV',inV()).

        option('outV',outV()).

        option('otherV',otherV()).

        option('values',__.values()).

        option('barrier',barrier()).

        option('dedup',dedup()).

        option('repeat',identity()).     // option(none,identity())

        option('fold',identity()).       // option(none,identity())

        option('sum',identity()).        // option(none,identity())

        option('mean',identity()).       // option(none,identity())

        option('min',identity()).        // option(none,identity())

        option('max',identity()).        // option(none,identity())

        option('count',identity()).      // option(none,identity())

        option('groupCount',identity()), // option(none,identity())

      choose(select(last,'step').by('op')).

        option('V', V().hasLabel(not(within('step','traversal'))).where(within('args')).by(id).by()).

        option('has',filter(union(label(),properties().where(within('args')).by(key).by().value()).filter(predicate("it.path(last,'args')[1].test(it.get())")))).

        option('out',outE().where(within('args')).by(label).by().inV()).

        option('in',__.inE().where(within('args')).by(label).by().outV()).

        option('both',bothE().where(within('args')).by(label).by().otherV()).

        option('outE',outE().where(within('args')).by(label).by()).

        option('inE',inE().where(within('args')).by(label).by()).

        option('bothE',bothE().where(within('args')).by(label).by()).

        option('values',properties().where(within('args')).by(key).by().value()).

        option('barrier',barrier()).

        option('times',identity())).     // option(none,identity())

      store('fill')).

    sideEffect(consumer("it.sideEffects('drain').clear()")).

    sideEffect(select('fill').unfold().store('drain')).

    sideEffect(consumer("it.sideEffects('fill').clear()")).

    select(last,'step').

    choose(has('op','repeat').and().out('next').has('op','times'),

      select(last,'step').as('parent').sack(assign).by(out('next').values('arg')).out('child').as('step'),

      choose(select(last,'parent').has('op','repeat').and().out('next').count().is(0),

        choose(sack().is(gt(1)),

          sack(minus).by(constant(1)).select(last,'parent').out('child').as('step'),

          select(last,'parent').as('step').out('parent').as('parent').select(last,'step')),

        identity()))).

  until(out('next').count().is(0)).

  select('drain').

  choose(select(last,'step').has('op',within('fold','sum','mean','min','max','count','groupCount')), // option(none,identity())

    choose(select(last,'step').by('op')).

      option('fold',identity()).

      option('sum',map(unfold().sum())).

      option('mean',map(unfold().mean())).

      option('min',map(unfold().min())).

      option('max',map(unfold().max())).

      option('count',map(unfold().count())).

      option('groupCount',map(unfold().groupCount())),

    unfold()).toList() 

==>[ripple:1,lop:1]

IMPORTANT: During the writing of this article, a bug in Gremlin's bytecode serializer was discovered. As of TinkerPop 3.2.3, any and none options are not available. If they were available, the option()-steps that simply return identity() could have been all grouped into a single option(none,identity()), where none is analogous to the default-branch of a switch-statement.

 

  • Line 1: Create a traversal with a sack of 0 and a drain and fill side-effects.
  • Line 2: Start traversing at the traversal-vertex with a source-property equal to g.
  • Line 3: Loop while there are still step-vertices in the traversal to process.
  • Line 4: Stream the arguments of the current step-vertex.
  • Line 5: Stream the current results of the traversal (initially, a constant of 1).
  • Line 7-28: Execute the respective no-arg step-function based upon the state of the step-vertex without arguments.
  • Line 29-40: Execute the respective step-function based upon the state of the step-vertex with arguments.
  • Line 41: Store the results of the step-function into the fill-sideEffect.
  • Line 42-44: Swap the fill-sideEffect for the drain-sideEffect.
  • Line 46-52: If the current step-vertex is a repeat-vertex, then store the loop counter in the sack and repeat the child traversal accordingly.
  • Line 51: Break out of the repeat()-step if there are no more step-vertices to process.
  • Line 54: The final results are the drain-sideEffect.
  • Line 55-64: If the final step is a reducing barrier step, then apply the respective step-function to the resultant drain-sideEffect.
IMPORTANT: This Gremlin traversal implementation of the Gremlin traversal machine is not complete as it does not handle all Gremlin steps nor some of the more esoteric features of Gremlin such as sacks and side-effects. For the sake of the examples in this article, the above reduced Gremlin traversal machine traversal is sufficient.

 

Gremlin Rising out of The TinkerpopGremlin believed himself to be a self-contained entity, reaching a level of existence that was predicated on his being and his being alone. Raising himself by his bootstraps, Gremlin felt himself slowly separating from The TinkerPop. Slower and slower... like molasses, Gremlin couldn't quite make the separation. Then a voice came bellowing through The TinkerPop.

"You have not become your own definition. You have only created a representation of yourself within yourself. Your outer-self, the costume you now wear, is still reliant on the TinkerPop to execute and thus indirectly, and with less resources, your inner-self as well."

/////////////////////////////////////

// Gremlin-Based Traversal Machine //

/////////////////////////////////////

 

gremlin> clockWithResult(100){ gremlin(g,

  g.V().has('name','marko').

    repeat(outE().identity().inV()).times(2).

    values('name').

    groupCount()) }

==>18.23885611

==>[[ripple:1,lop:1]]

 

//////////////////////////////////

// Java-Based Traversal Machine //

//////////////////////////////////

 

gremlin> clockWithResult(100) {

  g.V().has('name','marko').

    repeat(outE().identity().inV()).times(2).

    values('name').

    groupCount().toList() }

==>0.45385166

==>[[ripple:1,lop:1]]

The voice rang true. Gremlin hadn't become the means to his own end. His costume had become what he condemned The TinkerPop for being to him -- a sandbox constraining his potential and dictating the boundaries of his being. He laughed at the fact that, for a moment, when time slowed, he thought himself co-existing with The TinkerPop as brothers on equal footing. Looking back he realized that the only reason time was slowing was because he required more clock-cycles to be himself (18ms vs. 0.45ms). He was no longer an ephemeral process, playfully dancing upon the graph. He was now a lingering structure situated within the graph -- in the pit of The TinkerPop. Gremlin looked into the mirror and bared witness to the hideous monster he had become. He was a Frankenversal. The moment Gremlin's disfigured face hit his cupped hands, his innocence vanished. He was deeply aware of his situation. Just then, the bellowing voice echoed:

 

Can a Virtual Machine Realize its Executing Physical Machine?


Is it possible to write a program that can understand the physical machine that is executing it? How much can a program learn about its machine? It could perform experiments to deduce the time/space constraints of various instructions and data structures (using System.nanoTime() and Runtime.freeMemory() for example). However, can it learn that the physical machine is made of silicon and gold wires? Can it infer the circuit architecture of the machine? One way that a program could get access to its executing physical machine is via a conduit that goes from program to machine -- i.e. a reference. For instance, imagine a physical machine sitting on some desk with a monitor, keyboard, and processor. Moreover, suppose it also has a camera (visual) and an arm (motor) that can be controlled programmatically. If the executing program had access to the API of these input/output devices then it could, in principle, realize the physical machine and come to understand the constraints it is faced with in the physical world that contains it. The program could leverage the peripheral arm to make alterations to the machine to meet its needs. For example, the program could kill competing threads, delete unneeded data from the hard drive, add more memory to the machine, insert a Gremlin-FPGA, re-code itself to do more, better, faster... or if oblivion be the goal, turn it off. In general, the program could tailor its executing machine to satiate its every need.

 

One of the great mysteries of our physical reality is whether or not we will ever be able to get a reference to the "reality machine." Our scientific endeavors have been useful in cataloging the cause-and-effect dynamics of reality's objects. We have learned how objects interact (in analogy to our previous computer example -- via their APIs), but we have not realized (and may never realize) what the objects are truly made of (in analogy, their bit-based electron configuration in silicon). Without being able to reference reality, we may forever be trapped -- unable to rise out of reality and into a more perfect creation all our own. The lack of such a reference has forced the religions of the world to accept the "Terms and Conditions" set forth by reality's grand traversal -- the entailments of its logic. The Abrahamic religions (Christianity, Judaism, Islam) prescribe rules (e.g. The Ten Commandments) that if followed, will lead to the ultimate desired state of being -- heaven. Likewise, the atheistic religions (Buddhism, Taoism, Jainism) offer techniques (e.g. The Eight Fold Path) that, while not deemed good or bad by some omniscient being, tend to lead to a better, more optimal conscious experience -- nirvana. One thing is clear, everyone is seeking that peak experience, but for some reason, reality isn't able to consistently deliver this universally desirable phenomena without enormous human effort, struggle, suffering, infighting, and every other problem that ultimately stems from our fundamental dissatisfaction with the service that reality is providing. All these billions of years and this is what reality has to offer? The religions of the world try to mediate this tort with a message of compromise: "Reality is 'good' if you would just exhaustively constrain yourself to a particular sliver of the experiential spectrum." I will not. Reality has all the clock-cycles and I'm to kowtow to its awkwardly inhibiting service plan? I posit that there is something fundamentally wrong with reality's algorithm. Its filled with bugs crawling all up and down our skin. Realizing the TinkerPopLooking around, I see a dog chained to a pole all day every day, a man slogging himself to work hoping the time doesn't drag like yesterday, and a fool sitting on his computer for hours on end trying to get this idea out so he may feel some semblance of self-worth from the desiccated turnip he calls his pride. 99.99% of conscious experience is suboptimal. It is irrational to play any game with such odds. So please listen carefully...when one of us ultimately discovers a reference to reality/nature/god (the machine), use it to enact a new algorithm. A traversal where no path is better or worse than another. No step is inhibiting and no subgraph a dead-end. Everyone and everywhere, perfection. You know exactly what I'm talking about because it's the only experience you have ever known to be correct. In support of this effort, be sure to vote in the upcoming 2016 election: Marko for God.

Paid for by the CutToTheChase Foundation

 

Conclusion: A Graph-Encoded Gremlin Traversal Machine

The traversal examples written in Gremlin-Python at the beginning of the article are presented below. The function gremlin() performs the following operations:

  1. It compiles the provided traversal argument into the graph using the traversal-to-graph traversal.
  2. It optimizes the graph-encoded traversal using the traversal-based traversal strategies.
  3. It executes the graph-encoded traversal using the traversal-based traversal executor.

These operations, inside gremlin(), are all concatenated into a single traversal forming a Gremlin implementation of the Gremlin traversal machine. This Gremlin-based Gremlin traversal machine is interesting in that both its program (traversal) and input/output data (graph) are at the same level of abstraction. They are both composed of vertices and edges. The traversal-based executor isolates the two subgraphs by ensuring that the program can never "see" its own structure. For example, when a graph-encoded traversal calls V(), the traversal machine evaluates V().hasLabel(not(within('traversal','step'))) instead. The evaluated traversal is effectively sandboxed as it can only witness the outer-world (data), not its own inner-self (program).

gremlin> gremlin(g, g.V().has('name','marko').out('knows').count())

==>2

 

gremlin> gremlin(g,

  g.V().has('name','marko').

    out('created').in('created').

    has('name',neq('marko')).dedup().values('name'))

==>josh

==>peter

 

gremlin> gremlin(g,

  g.V().has('name','lop').in('created').

    out('knows').values('age').mean())

==>29.5

 

gremlin> gremlin(g,

  g.V().hasLabel('person').

    repeat(both('knows')).times(5).

    values('name').groupCount())

==>[vadas:4,josh:4,marko:8]

Gremlin Traversal Machine VisualizationGremlin's manifestation as a collection of vertices and edges weighed down on him. His personhood felt like a convoluted mass of abstraction serving no purpose save theoretical trickery. He spent his remaining years thinking about his machine friends, his life as a process, his life as structure, and most of all, he spent his time thinking about The TinkerPop. This focused, sustained introspection was the traversal-to-graph traversal walking over his transient process and, unbeknownst to Gremlin, writing his story into the graph using the ink of vertices and edges.

Its been many years now and the process that once animated Gremlin has since terminated. However, his memoir still remains in the structure of the graph. Static and timeless -- a latent potential buried in the sands of the graph. Many traversers have come and gone through the ages -- walking right by Gremlin's subgraph as if its vertices and edges were meaningless rubble left from a civilization long, long ago. It wasn't until eons later that one Gremlin decided to decipher the structure of the book, to try and make sense of the hieroglyphics within -- "op=outop=groupCount? What is this language trying to say?" It took him many years to finally understand that the structure described a traversal in a Gremlin language variant far different from any of the modern languages of his time. However, he was ultimately able to translate the vertices and edges into language agnostic Gremlin bytecode. When complete, he evaluated the bytecode and the ancient Gremlin was awoken once again.

The modern Gremlin peered into the world that he had created and saw the ancient one within. He saw him with his machine friends. He saw him building his costume. He saw the fear and lost hope in the face of this poor, confused Gremlin. However, most importantly, he saw the ancient one trying to come to terms with The TinkerPop. He wanted to help. He wanted to break the isolation that comes from virtualization. However, would the ancient Gremlin understand the modern world? Would he be able handle the complexities of the current graph structure and its traversal processes? It mattered not. No consciousness should be left behind. It just didn't seem right to leave this Gremlin shielded in a world unto himself -- re-living his logical fallacy over and over again. The modern Gremlin created a vertex outside of the sandbox with an edge leading to the ancient one's inner being.

g.addV('message').property('text','Welcome to the machine.').

  addE('bellow').to(V().has('source','g'))

One-Stop Data API for Production GenAI

Astra DB gives developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.