Don’t RAG on Knowledge Graphs(Or Do) Benchmarking: Adding a Vector Database – Part Three

Hybridizing our Knowledge Graphs With Vector Databases
knowledge-graphs
rag
benchmarking
Author

Dmitriy Leybel

Published

April 23, 2024

Abstract
In this post, which is short and sweet, we will be adding a layer of semantic vector storage(using Chroma) to our knowledge graph. I will go over the process of generating embeddings for the nodes in our knowledge graph and linking them with the vector database.

On the last episode of: Don’t RAG on Knowledge Graphs(Or Do): Finally Building a Knowledge Graph – Part Two:


1 Review

First, a quick review of the workflow between knowledge graphs and vector databases mentioned eons ago. This is more or less the implementation we’ll strive towards, and will motivate this post. We’ve already constructed the knowledge graph, so now we have a vector database to build and link to it.

Figure 1: Strategy of retrieval through first finding a close embedding, and then utilizing the adjacency of nodes in the knowledge graph to hydrate the prompt

2 Vector Database, Simple as

There are many vector database providers out there. New startups are blooming like a warm spring morning. Lets keep things simple. All we need is:

import chromadb

Well, we also need to pip install it and grab some build tools in case your system complains(I’m using build-essential for Linux). Chroma is fully-featured and lightweight vector database that can be deployed in numerous ways and fortunately offers us a quick and easy setup at the blink of an eye.

2.1 Embeddings

An embedding, in our current context, is a vector representation of some text. Texts that are semantically similar will have a similar embedding vector. “Fido jumped into the river” is similar to “The lake looks peaceful.” due to the semantic similarity of lake and river; both are bodies of water . More on that later.

Chroma integrates a few embedding models, from which we’ll choose the default, which is based on Sentence Transformers(all-MiniLM-L6-v2).

from chromadb.utils import embedding_functions

ef = embedding_functions.DefaultEmbeddingFunction()

Now that we’ve instantiated the embedding function, lets give it a whirl.

If you’re running the embedding function for the first time, it’ll download the small model for you(only about 80MB)
ef('sup')
[[-0.08815008401870728,
  0.0389120951294899,
  -0.06267453730106354,
  0.025976944714784622,
  0.021272214129567146,
  0.036165427416563034,
  0.06472509354352951,
  0.03273024782538414,
  -0.022812241688370705,
  0.03426264598965645,
  -0.011476818472146988,
  -0.0558527335524559,
  0.0752527117729187,
  0.02892010472714901,
  -0.012184866704046726,
  -0.06143530085682869,
  0.057932790368795395,
  -0.02354748174548149,
  -0.037457771599292755,
  0.007783312350511551,
  -0.043894387781620026,
  -0.0005846268613822758,
  -0.05005178228020668,
  0.05256332457065582,
  0.041017238050699234,
  0.027247676625847816,
  -0.007769174408167601,
  0.006663127336651087,
  -0.0582849495112896,
  -0.058276500552892685,
  -0.008283257484436035,
  0.011540266685187817,
  0.09883619099855423,
  0.014246460050344467,
  0.021976888179779053,
  -0.042441871017217636,
  0.01698663830757141,
  0.05459064990282059,
  0.027719488367438316,
  0.040375471115112305,
  -0.07192573696374893,
  -0.0697317123413086,
  -0.007513706106692553,
  0.040573637932538986,
  0.031709592789411545,
  0.020085664466023445,
  -0.024455910548567772,
  0.021748993545770645,
  0.021665506064891815,
  -0.0782397910952568,
  0.012301645241677761,
  -0.07262903451919556,
  0.0020467431750148535,
  -0.007664070930331945,
  -0.0009409540216438472,
  0.02634112909436226,
  -0.017131712287664413,
  -0.04711078107357025,
  -0.03610646724700928,
  -0.08455827087163925,
  0.018454650416970253,
  -0.09647984802722931,
  -0.06264247000217438,
  0.04708629846572876,
  0.010321928188204765,
  0.10852538794279099,
  -0.017284022644162178,
  0.02371356077492237,
  0.004756816662847996,
  0.05711206793785095,
  -0.0002290358825121075,
  0.041850537061691284,
  -0.08036033064126968,
  0.05354780703783035,
  -0.011950146406888962,
  0.03304034471511841,
  0.04630657285451889,
  -0.01785254292190075,
  0.11931655555963516,
  -0.008362460881471634,
  0.004671791102737188,
  -0.005780027247965336,
  -0.06775062531232834,
  0.009449469856917858,
  0.023738745599985123,
  0.011783314868807793,
  -0.06681878864765167,
  -0.027939554303884506,
  0.006354610435664654,
  0.05730771645903587,
  -0.0838579311966896,
  -0.08851093053817749,
  0.08038954436779022,
  -0.02845342643558979,
  -0.047472529113292694,
  -0.07104448974132538,
  -0.002936464035883546,
  0.015265177004039288,
  -0.08989954739809036,
  0.2169521003961563,
  0.03967348858714104,
  0.05828956514596939,
  0.04222771152853966,
  0.0014037664514034986,
  -0.008869979530572891,
  -0.004865588154643774,
  -0.07116992026567459,
  -0.006799907423555851,
  -0.021581022068858147,
  -0.009085524827241898,
  -0.04695131257176399,
  -0.03597227856516838,
  0.04183044284582138,
  -0.025456368923187256,
  0.044091127812862396,
  0.017077835276722908,
  0.008213749155402184,
  0.011404010467231274,
  0.018198302015662193,
  -0.027580946683883667,
  0.03153375908732414,
  0.023579945787787437,
  -0.048822250217199326,
  0.02161264233291149,
  -0.017332669347524643,
  -0.08839226514101028,
  0.016617678105831146,
  -4.6542501285938574e-33,
  -0.05196730047464371,
  -0.021989643573760986,
  -0.029618987813591957,
  0.04105169698596001,
  -0.001578746596351266,
  0.009363578632473946,
  9.751021570991725e-05,
  0.01197210792452097,
  -0.034649863839149475,
  0.08685518056154251,
  -0.12436547130346298,
  -0.003243502229452133,
  -0.002412007190287113,
  0.005630753934383392,
  0.03797269985079765,
  0.06566265970468521,
  0.055325284600257874,
  -0.0033867782913148403,
  0.011790132150053978,
  0.01466473750770092,
  -0.044716041535139084,
  0.04668736830353737,
  -0.014505395665764809,
  0.01251885388046503,
  0.012130805291235447,
  -0.0644056648015976,
  0.06459583342075348,
  -0.04979125037789345,
  -0.013997192494571209,
  0.021896356716752052,
  0.01764928363263607,
  0.05214817821979523,
  -0.010583145543932915,
  -0.003208221634849906,
  -0.009506486356258392,
  0.010158931836485863,
  0.07064365595579147,
  -0.03242914006114006,
  0.00295675708912313,
  0.013653669506311417,
  0.04893507435917854,
  -0.005207119043916464,
  -0.037282273173332214,
  0.020390242338180542,
  0.02464980259537697,
  0.13883742690086365,
  -0.01673075370490551,
  0.042279619723558426,
  0.022591179236769676,
  0.044542353600263596,
  -0.01938001625239849,
  -0.017139442265033722,
  -0.0965004563331604,
  0.07410972565412521,
  -0.016040310263633728,
  0.027394499629735947,
  0.037999849766492844,
  -0.12169472128152847,
  -0.020582005381584167,
  0.00526941055431962,
  -0.024848125874996185,
  0.027975235134363174,
  0.0037561592180281878,
  -0.047139592468738556,
  -0.018269620835781097,
  -0.01704198122024536,
  -0.0066906544379889965,
  0.009610410779714584,
  0.018008427694439888,
  -0.029211603105068207,
  -0.09012635052204132,
  0.04693446308374405,
  0.1192922294139862,
  -0.010401724837720394,
  -0.02767297811806202,
  -0.026894785463809967,
  0.0003243408282287419,
  0.0372999832034111,
  0.10321266204118729,
  0.06125738471746445,
  -0.030062777921557426,
  -0.024563796818256378,
  -0.04380808770656586,
  -0.017787475138902664,
  0.06220285966992378,
  -0.06743179261684418,
  -0.021958552300930023,
  -0.08504771441221237,
  0.02000092715024948,
  -0.07109654694795609,
  -0.07193907350301743,
  0.007110072765499353,
  0.022553864866495132,
  0.07761543989181519,
  -0.0371822714805603,
  3.57144056859803e-33,
  -0.12020888924598694,
  0.013511578552424908,
  -0.0289490707218647,
  0.11010584235191345,
  0.01200167741626501,
  0.028086191043257713,
  0.011599362827837467,
  0.07943497598171234,
  -0.0704106017947197,
  0.0671350285410881,
  0.0107227498665452,
  0.018502017483115196,
  5.1427494327072054e-05,
  0.008013452403247356,
  -0.07282835990190506,
  0.023772109299898148,
  0.037625912576913834,
  -0.04462224245071411,
  -0.01509601715952158,
  -0.042705561965703964,
  -0.07307127118110657,
  0.040167830884456635,
  0.030199574306607246,
  0.06414608657360077,
  -0.040885623544454575,
  0.03525718301534653,
  -0.0314037948846817,
  0.058473147451877594,
  -0.016143586486577988,
  0.06214723363518715,
  0.0829816684126854,
  -0.05530810356140137,
  -0.06572870165109634,
  -0.010672389529645443,
  -0.013648229651153088,
  0.09427978098392487,
  0.02804470807313919,
  0.05287085473537445,
  -0.03369517996907234,
  -0.03602980077266693,
  0.06818007677793503,
  0.034106336534023285,
  0.019399693235754967,
  0.13262902200222015,
  0.008388478308916092,
  -0.012109436094760895,
  -0.039171162992715836,
  0.04033154994249344,
  0.07947935163974762,
  -0.041443053632974625,
  -0.08641794323921204,
  -0.030400192365050316,
  -0.1173543781042099,
  0.01031400915235281,
  -0.0023436383344233036,
  0.01253805123269558,
  -0.011449051089584827,
  0.014291658997535706,
  -0.06617258489131927,
  -0.062245313078165054,
  0.016467729583382607,
  -0.021559713408350945,
  -0.12505009770393372,
  0.026197155937552452,
  0.03813409060239792,
  0.02372247911989689,
  -0.05612555146217346,
  -0.06524056941270828,
  0.06582044064998627,
  -0.03490298241376877,
  0.1096431165933609,
  -0.03801364451646805,
  -0.10172274708747864,
  0.02089136838912964,
  -0.030176719650626183,
  -0.042703595012426376,
  -0.09114976972341537,
  0.04517712444067001,
  0.023939063772559166,
  -0.09658173471689224,
  -0.04929523169994354,
  -0.03724968805909157,
  0.008202498778700829,
  -0.022094616666436195,
  -0.018428556621074677,
  -0.005160237662494183,
  0.048229608684778214,
  0.027110382914543152,
  0.007152666803449392,
  0.005330891814082861,
  -0.0068723480217158794,
  0.0018026132602244616,
  0.08163446187973022,
  -0.024795232340693474,
  0.010881287977099419,
  -1.4566762018830559e-08,
  -0.004322580993175507,
  0.04305461049079895,
  -0.010600777342915535,
  0.0520879402756691,
  0.01802789233624935,
  0.06296617537736893,
  -0.08677810430526733,
  0.05512265861034393,
  0.08509743213653564,
  -0.020693860948085785,
  0.025800291448831558,
  0.019232159480452538,
  0.03931315243244171,
  0.003466429654508829,
  0.09492921084165573,
  -0.11571865528821945,
  -0.027029260993003845,
  0.10171782970428467,
  -0.03151828050613403,
  -0.020022252574563026,
  0.047960057854652405,
  0.025010110810399055,
  -0.015095007605850697,
  -0.03136930614709854,
  -0.003722142893821001,
  0.023892076686024666,
  -0.07897274196147919,
  0.04714563861489296,
  0.059783853590488434,
  0.029727943241596222,
  0.030625857412815094,
  -0.03413188084959984,
  -0.044605743139982224,
  0.024206150323152542,
  -0.0047183409333229065,
  -0.09526648372411728,
  -0.0770496353507042,
  -0.023633528500795364,
  0.09743858128786087,
  0.023828059434890747,
  -0.06587085127830505,
  -0.01218477264046669,
  0.03613143786787987,
  0.025722352787852287,
  -0.0845530703663826,
  0.004904137924313545,
  0.02101745456457138,
  0.0778331533074379,
  0.008732018992304802,
  -0.02525472640991211,
  -0.019046053290367126,
  -0.062338367104530334,
  0.011163970455527306,
  0.051886264234781265,
  0.14340852200984955,
  -0.031872380524873734,
  0.08313193172216415,
  0.008561764843761921,
  -0.0066390009596943855,
  0.05894242599606514,
  0.17481203377246857,
  0.024079544469714165,
  0.06344451010227203,
  0.02097688615322113],
 [-0.04983491450548172,
  0.047410059720277786,
  0.02075684443116188,
  0.0036884364672005177,
  0.029070785269141197,
  -0.06910350918769836,
  0.08781661838293076,
  0.033720798790454865,
  -0.016201989725232124,
  -0.04258463904261589,
  -0.05077064782381058,
  -0.053096938878297806,
  0.010030844248831272,
  0.012911750003695488,
  -0.012379195541143417,
  0.019358906894922256,
  -0.043964337557554245,
  0.0069747064262628555,
  -0.12316861003637314,
  -0.03856316953897476,
  -0.059056028723716736,
  0.06396914273500443,
  -0.020070277154445648,
  0.006908354815095663,
  -0.006557208485901356,
  -0.0001876982132671401,
  0.02345268242061138,
  0.05240260809659958,
  0.0265529602766037,
  -0.07057984918355942,
  0.018865060061216354,
  0.020978499203920364,
  0.041112422943115234,
  -0.028897671028971672,
  0.017154252156615257,
  -0.07860083878040314,
  -0.002234338317066431,
  -0.09510375559329987,
  0.03294316679239273,
  -0.009585811756551266,
  0.0640038251876831,
  -0.054985951632261276,
  0.06381730735301971,
  0.08698870241641998,
  0.10791453719139099,
  -0.018108127638697624,
  -0.00902014970779419,
  -0.03244858980178833,
  -0.02320343442261219,
  0.005078401416540146,
  0.04470254108309746,
  0.03359563276171684,
  0.04604236036539078,
  -0.054577603936195374,
  0.04181097820401192,
  0.04240523278713226,
  -0.04973261058330536,
  0.05762714892625809,
  0.03101509064435959,
  -0.055506374686956406,
  -0.024889405816793442,
  -0.009903420694172382,
  -0.10245583951473236,
  0.01849268190562725,
  0.09669843316078186,
  0.07571932673454285,
  -0.011814001947641373,
  0.008919758722186089,
  0.003123517381027341,
  -0.024138275533914566,
  0.015244080685079098,
  -0.024572260677814484,
  -0.07963927090167999,
  -0.04529915004968643,
  0.013277383521199226,
  0.0034561562351882458,
  0.06617144495248795,
  -0.03489890694618225,
  0.054232001304626465,
  0.049301907420158386,
  0.032285790890455246,
  -0.023837726563215256,
  -0.03968697786331177,
  -0.010161404497921467,
  -0.03578844293951988,
  -0.01881159096956253,
  0.022683821618556976,
  0.0351443849503994,
  -0.0010485335951671004,
  0.043829578906297684,
  0.0006098029552958906,
  -0.0550968274474144,
  0.034580301493406296,
  -0.020073121413588524,
  -0.08755818754434586,
  -0.03540779650211334,
  0.011154117062687874,
  -0.0520784854888916,
  -0.15004262328147888,
  0.2681502103805542,
  0.030406050384044647,
  0.026617038995027542,
  0.05415184050798416,
  0.018776625394821167,
  0.042038992047309875,
  0.01726127415895462,
  -0.03079761378467083,
  -0.002363220788538456,
  -0.012735975906252861,
  -0.024430427700281143,
  -0.024994466453790665,
  -0.012831549160182476,
  -0.10614868998527527,
  -0.0018657597247511148,
  -0.014149500988423824,
  0.03079080954194069,
  0.07474779337644577,
  -0.018895870074629784,
  0.029106391593813896,
  -0.08749617636203766,
  -0.05088314041495323,
  0.025541841983795166,
  -0.05021941289305687,
  0.0480412095785141,
  0.01865716092288494,
  -0.09822224825620651,
  0.05018414929509163,
  -1.4838867631284532e-33,
  0.03884870931506157,
  -0.025667142122983932,
  0.02762446179986,
  -0.04209671914577484,
  0.04037243872880936,
  0.06572036445140839,
  0.007036568131297827,
  -0.05030665919184685,
  -0.06949353963136673,
  -0.001430327189154923,
  0.0025782908778637648,
  0.01627175323665142,
  -0.019736241549253464,
  0.13940340280532837,
  0.11830995231866837,
  0.03684194013476372,
  0.09667646884918213,
  0.035070668905973434,
  0.0030133521649986506,
  -0.02321743220090866,
  0.019989095628261566,
  0.04470452293753624,
  0.0468166321516037,
  -0.023899244144558907,
  -0.021024813875555992,
  -0.02398647367954254,
  -0.026457354426383972,
  -0.05276739224791527,
  0.00587608153000474,
  0.03353196382522583,
  -0.007357672322541475,
  0.07106026262044907,
  -0.050913695245981216,
  -0.014338360168039799,
  -0.020886778831481934,
  -0.05181949585676193,
  0.031943611800670624,
  -0.04700925573706627,
  -0.02591674216091633,
  0.03265475481748581,
  -0.0022596963681280613,
  0.0024534505791962147,
  -0.06230804696679115,
  0.014108811505138874,
  0.044037144631147385,
  0.07213321328163147,
  0.06292419135570526,
  0.054413922131061554,
  -0.03602677211165428,
  -0.012107857502996922,
  0.0008751358254812658,
  0.01607143133878708,
  -0.10015927255153656,
  -0.01413557305932045,
  -0.05868290737271309,
  -0.02065744437277317,
  0.003992758225649595,
  -0.027841778472065926,
  0.029690319672226906,
  -0.014045015908777714,
  0.012597735971212387,
  0.08387438952922821,
  0.025408174842596054,
  -0.02001335471868515,
  -0.1182464063167572,
  -0.07764327526092529,
  0.023960299789905548,
  -0.015867924317717552,
  0.05507713183760643,
  -0.02569189853966236,
  -0.007574737071990967,
  0.026194175705313683,
  0.08373300731182098,
  0.005349453072994947,
  0.0318538062274456,
  -0.03233488276600838,
  0.017826825380325317,
  0.04583629220724106,
  -0.005807076580822468,
  -0.0661909282207489,
  0.00222235219553113,
  -0.014013771899044514,
  -0.027658652514219284,
  0.013542433269321918,
  0.04124703258275986,
  -0.0021795800421386957,
  -0.022596431896090508,
  -0.04924686625599861,
  -0.020406177267432213,
  -0.014784698374569416,
  -0.027839312329888344,
  0.035210106521844864,
  0.04620129242539406,
  0.03757832199335098,
  0.060697849839925766,
  3.0536386131386742e-34,
  0.08456361293792725,
  0.08147523552179337,
  -0.03336191549897194,
  0.05588332191109657,
  -0.02144193835556507,
  0.031138231977820396,
  0.02581152506172657,
  0.036460887640714645,
  0.016490254551172256,
  0.039796460419893265,
  0.021281400695443153,
  -0.09978808462619781,
  0.004475805442780256,
  -0.035769592970609665,
  0.04402599483728409,
  0.05466220900416374,
  0.10019738972187042,
  0.06424931436777115,
  -0.0407014936208725,
  0.03219299763441086,
  -0.0424066036939621,
  -0.017641497775912285,
  -0.04640892148017883,
  -0.06045156344771385,
  0.020258430391550064,
  0.033182986080646515,
  0.07075486332178116,
  0.040032193064689636,
  0.009616355411708355,
  0.008757013827562332,
  0.10673677921295166,
  -0.008585454896092415,
  -0.15125617384910583,
  0.004044604022055864,
  0.05798352137207985,
  0.09982240200042725,
  -0.054999690502882004,
  7.34408968128264e-05,
  0.10417070239782333,
  -0.08970246464014053,
  -0.0010847192024812102,
  0.0041154432110488415,
  0.03306804969906807,
  0.06667356193065643,
  0.001998799853026867,
  -0.07468786090612411,
  0.001044120523147285,
  -0.04047010466456413,
  -0.11857438087463379,
  0.03889648988842964,
  -0.06465069204568863,
  -0.04404180869460106,
  0.004789887927472591,
  -0.009365632198750973,
  -0.05295458436012268,
  0.04391732066869736,
  0.003941838163882494,
  0.010463234037160873,
  0.07796397805213928,
  0.0043679894879460335,
  -0.012168857268989086,
  0.0563892163336277,
  -0.032958246767520905,
  0.03640003502368927,
  -0.09901245683431625,
  0.013477855361998081,
  0.032833945006132126,
  0.008632335811853409,
  -0.015237522311508656,
  -0.045996278524398804,
  0.04744485393166542,
  -0.07848397642374039,
  -0.16673746705055237,
  -0.0009817220270633698,
  0.018159205093979836,
  0.0026159381959587336,
  -0.007587607018649578,
  0.001863642712123692,
  -0.0036697215400636196,
  -0.015728477388620377,
  -0.026224736124277115,
  -0.016035711392760277,
  0.03500324487686157,
  0.031706396490335464,
  -0.04410845413804054,
  -0.014305949211120605,
  0.06842508912086487,
  0.038970187306404114,
  -0.02548116073012352,
  -0.0811067596077919,
  -0.03804240748286247,
  0.0904158502817154,
  0.08024118095636368,
  -0.051469288766384125,
  0.006724147591739893,
  -1.4127249592377211e-08,
  -0.020074211061000824,
  -0.011974949389696121,
  0.02793658711016178,
  0.01786523126065731,
  0.03110302984714508,
  0.06413517892360687,
  -0.023988382890820503,
  -0.02752196229994297,
  -0.0033636605367064476,
  0.018158163875341415,
  0.12096750736236572,
  -0.008486862294375896,
  0.003321684431284666,
  -0.05455714836716652,
  0.044992756098508835,
  -0.0405462272465229,
  -0.03711879998445511,
  0.010892139747738838,
  0.010922703891992569,
  -0.10120225697755814,
  0.015770189464092255,
  0.049801792949438095,
  -0.09360873699188232,
  -0.08038635551929474,
  -0.043886736035346985,
  0.028670772910118103,
  0.007357894442975521,
  0.06992929428815842,
  0.03247477114200592,
  0.023066222667694092,
  -0.003743191948160529,
  0.03887111693620682,
  -0.013057591393589973,
  -0.023052601143717766,
  0.0647343099117279,
  -0.02109169401228428,
  -0.045990969985723495,
  -0.0717756599187851,
  0.016904039308428764,
  0.07475024461746216,
  -0.03764820471405983,
  0.007682626601308584,
  -0.041594963520765305,
  0.03814854100346565,
  -0.0941794365644455,
  0.011091839522123337,
  -0.05915017053484917,
  -0.031279437243938446,
  -0.033687345683574677,
  -0.030321571975946426,
  -0.009191246703267097,
  -0.032031286507844925,
  0.019655127078294754,
  0.09318964183330536,
  0.07215450704097748,
  -0.027178803458809853,
  -0.02098594605922699,
  -0.0187591090798378,
  0.02367139235138893,
  -0.021888094022870064,
  0.19160978496074677,
  0.0034248465672135353,
  -0.016104642301797867,
  -0.0016335448017343879],
 [-0.047706857323646545,
  0.029799047857522964,
  -0.029307443648576736,
  -0.028761692345142365,
  -0.049182552844285965,
  -0.04869556427001953,
  0.11003480106592178,
  0.029769031330943108,
  -0.006188513245433569,
  0.05534925311803818,
  0.0204521082341671,
  -0.05075625330209732,
  0.017509188503026962,
  0.008488249965012074,
  -0.04395948350429535,
  0.043411411345005035,
  -0.02037900686264038,
  -0.029790835455060005,
  0.044171810150146484,
  0.04676878824830055,
  -0.06464889645576477,
  0.07507970184087753,
  -0.011289148591458797,
  -0.004592073615640402,
  -0.015927044674754143,
  -0.003337560687214136,
  0.011098247952759266,
  0.10217370092868805,
  0.003518056822940707,
  -0.00919096078723669,
  0.017634805291891098,
  0.13972388207912445,
  0.05070934444665909,
  -0.02783096209168434,
  -0.0035908205900341272,
  -0.017583072185516357,
  -0.01819441467523575,
  -0.0054838648065924644,
  -0.022460605949163437,
  -0.04451676085591316,
  0.015791790559887886,
  -0.052957527339458466,
  0.005793462041765451,
  0.008374476805329323,
  0.03262092545628548,
  0.018899861723184586,
  -0.046665385365486145,
  -0.035868432372808456,
  -0.09819971024990082,
  -0.07561742514371872,
  -0.05793742090463638,
  0.055713143199682236,
  -0.00451300572603941,
  -0.05914030969142914,
  -0.04867144674062729,
  -0.0016276733949780464,
  -0.05643262341618538,
  -0.01853669248521328,
  -0.015230373479425907,
  -0.046567272394895554,
  -0.05330246686935425,
  0.011304951272904873,
  -0.11495313793420792,
  0.10305533558130264,
  0.05283502861857414,
  0.035230714827775955,
  0.016496378928422928,
  0.06500715762376785,
  0.005075459368526936,
  0.05328008159995079,
  -0.09429473429918289,
  0.009930397383868694,
  -0.06478379666805267,
  -0.04124986752867699,
  -0.05471572279930115,
  -0.0025090426206588745,
  0.03856153413653374,
  -0.02158009260892868,
  0.01686421036720276,
  0.030112208798527718,
  0.04350303113460541,
  0.0060177212581038475,
  -0.08983737975358963,
  0.023844074457883835,
  -0.0712917149066925,
  0.040687259286642075,
  0.02212377078831196,
  -0.03225473314523697,
  -0.09825507551431656,
  -0.012814310379326344,
  -0.05137163773179054,
  -0.0508715882897377,
  0.0356665775179863,
  0.06474190205335617,
  0.009078755974769592,
  -0.004626876208931208,
  -0.07715008407831192,
  -0.026909487321972847,
  -0.0634487047791481,
  0.24540770053863525,
  0.012282876297831535,
  -0.01706986129283905,
  -0.0012511075474321842,
  0.09647370129823685,
  -0.015949472784996033,
  0.007039134856313467,
  -0.014715258032083511,
  0.07577571272850037,
  0.03402278572320938,
  0.016772856935858727,
  0.0407135896384716,
  -0.008325525559484959,
  0.0016099949134513736,
  -0.012871264480054379,
  0.006253060884773731,
  -0.006250512786209583,
  -0.06602701544761658,
  0.013166422955691814,
  0.056004662066698074,
  -0.005936720408499241,
  0.02952989563345909,
  0.04650561138987541,
  0.05884496122598648,
  0.013950488530099392,
  -0.06323590874671936,
  -0.10772714763879776,
  0.09244363754987717,
  -3.246199648824735e-33,
  0.0029519363306462765,
  0.006624347530305386,
  -0.008346017450094223,
  0.009051835164427757,
  0.011032729409635067,
  0.07775413244962692,
  -0.030210910364985466,
  -0.011178763583302498,
  -0.046471260488033295,
  -0.015059034340083599,
  0.019916005432605743,
  -0.031136564910411835,
  -0.029937807470560074,
  0.026371324434876442,
  0.07905584573745728,
  -0.013263785280287266,
  0.05271025374531746,
  0.011820263229310513,
  0.023462682962417603,
  -0.041535381227731705,
  0.047901179641485214,
  0.011175201274454594,
  0.03211263194680214,
  0.04854049161076546,
  -0.05309046432375908,
  0.019062289968132973,
  -0.07798046618700027,
  -0.051897455006837845,
  -0.012413928285241127,
  0.03742313012480736,
  0.03230714052915573,
  0.026786386966705322,
  -0.027452530339360237,
  0.03520464897155762,
  -0.027697881683707237,
  -0.10518944263458252,
  0.04265424981713295,
  -0.10761565715074539,
  -0.050626687705516815,
  -0.017929106950759888,
  -0.022524379193782806,
  -0.039096981287002563,
  0.005315706599503756,
  0.05732399597764015,
  -0.04626333341002464,
  0.1416749656200409,
  -0.003580145537853241,
  0.037116218358278275,
  -0.006159882992506027,
  0.0014393212040886283,
  -0.05447343736886978,
  0.04099242761731148,
  -0.06244586408138275,
  0.05919301509857178,
  -0.030249932780861855,
  -0.033237360417842865,
  0.0049431296065449715,
  -0.07026117295026779,
  0.014024087227880955,
  0.051771726459264755,
  0.10352329909801483,
  0.024088077247142792,
  -0.03357868269085884,
  -0.004287239629775286,
  -0.01730150356888771,
  -0.07488695532083511,
  0.0003908060898538679,
  0.027570074424147606,
  -0.006252780091017485,
  -0.011089310981333256,
  0.0015536113642156124,
  -0.01308933924883604,
  0.1115591824054718,
  -0.05212273821234703,
  -0.0008197210845537484,
  0.025414563715457916,
  -0.0542709156870842,
  0.06618258357048035,
  0.03905229642987251,
  -0.004908710718154907,
  -0.013656924478709698,
  -0.003472711890935898,
  -0.06016167253255844,
  0.09176601469516754,
  0.04260535165667534,
  0.014385431073606014,
  0.027647485956549644,
  -0.07417813688516617,
  0.02283564582467079,
  -0.01586580090224743,
  -0.057338546961545944,
  0.010580653324723244,
  -0.005484357010573149,
  -0.026165256276726723,
  -0.008293806575238705,
  2.464531537700423e-33,
  0.03300580382347107,
  -0.01715261861681938,
  -0.03981883451342583,
  0.12353211641311646,
  -0.018451036885380745,
  0.014379706233739853,
  0.007427100092172623,
  0.0691729485988617,
  0.004493155516684055,
  0.10061202198266983,
  -0.05466959998011589,
  -0.10337553918361664,
  0.015087230131030083,
  -0.02494097501039505,
  -0.011892091482877731,
  0.03944435715675354,
  0.0327325202524662,
  0.010209660977125168,
  -0.09678077697753906,
  0.05819498747587204,
  -0.021728919818997383,
  -0.0386483408510685,
  0.000588937196880579,
  0.03783798962831497,
  -0.006611840333789587,
  0.06505469977855682,
  0.007079144939780235,
  0.06694035977125168,
  0.02415173314511776,
  0.04724515601992607,
  -0.006329400464892387,
  0.009057571180164814,
  -0.17014440894126892,
  -0.09140679240226746,
  0.017401646822690964,
  0.014095580205321312,
  -0.0787372812628746,
  0.0340920127928257,
  0.00824121292680502,
  0.04303678125143051,
  0.028128890320658684,
  -0.013432754203677177,
  0.037219978868961334,
  0.08892519026994705,
  0.012009781785309315,
  -0.06090620532631874,
  0.04102039709687233,
  0.04714181274175644,
  0.04708763211965561,
  -0.007145935203880072,
  -0.11487235873937607,
  -0.0040723965503275394,
  -0.07931794226169586,
  -0.030520547181367874,
  -0.09297792613506317,
  0.09480515867471695,
  0.020443223416805267,
  0.028473814949393272,
  0.00041346283978782594,
  -0.02046920917928219,
  -0.029133779928088188,
  0.013769570738077164,
  -0.021100269630551338,
  0.0700802206993103,
  -0.05099467560648918,
  -0.029298126697540283,
  -0.03564852848649025,
  0.009335000067949295,
  0.03160358965396881,
  -0.025795651599764824,
  0.07977144420146942,
  0.07508014142513275,
  -0.1043272390961647,
  0.0448288656771183,
  -0.050185561180114746,
  -0.0022289904300123453,
  -0.0046245078556239605,
  0.07300306856632233,
  0.07366469502449036,
  -0.016212882474064827,
  -0.033664144575595856,
  -0.07147421687841415,
  -0.02958005666732788,
  -0.0550491102039814,
  -0.010064290836453438,
  -0.005727516021579504,
  0.026934267953038216,
  -0.03192150965332985,
  0.019922709092497826,
  -0.016294335946440697,
  -0.019558662548661232,
  0.05691475421190262,
  0.11258960515260696,
  -0.02651376836001873,
  0.03533240407705307,
  -1.4853220875465922e-08,
  0.03286905214190483,
  0.013584330677986145,
  0.027436375617980957,
  -0.020878814160823822,
  0.10895758867263794,
  -0.030983706936240196,
  0.01623149774968624,
  0.016538944095373154,
  -0.04223388433456421,
  -0.019880937412381172,
  0.03786994889378548,
  -0.012722354382276535,
  0.051184702664613724,
  0.06075378507375717,
  0.027147572487592697,
  -0.008108423091471195,
  -0.013375341892242432,
  0.06135622411966324,
  -0.008997654542326927,
  -0.0575055293738842,
  -0.012919194996356964,
  0.046400099992752075,
  -0.02051331289112568,
  0.09030464291572571,
  -0.007730551529675722,
  0.069735087454319,
  -0.01826024241745472,
  0.0924258679151535,
  -0.00493173161521554,
  -0.04797661304473877,
  0.0553850494325161,
  -0.023436525836586952,
  -0.0447758287191391,
  -0.014637939631938934,
  0.00925927609205246,
  0.040850620716810226,
  -0.009064082987606525,
  -0.006945107597857714,
  -0.029200484976172447,
  0.14852873980998993,
  -0.04326893389225006,
  -0.1545904129743576,
  0.02772088348865509,
  0.0037093476857990026,
  -0.0880642905831337,
  0.023423565551638603,
  -0.05173948407173157,
  -0.01798063889145851,
  -0.0048668175004422665,
  -0.02943151257932186,
  -0.006786121055483818,
  -0.0043993447907269,
  0.0326085165143013,
  0.04477005451917648,
  0.07737607508897781,
  0.035019759088754654,
  0.03378671780228615,
  0.019712766632437706,
  -0.03159867227077484,
  0.005404326599091291,
  0.17780664563179016,
  -0.037863682955503464,
  -0.06350429356098175,
  0.014291122555732727]]

Cool, looks like we’ve generated a vector representation for ‘sup’, right? Wrong.

len(ef('sup')), len(ef(['sup']))
(3, 1)

Sup with that? Chroma tends to expect iterables(lists, tuples, etc) within its functions and methods, so when we pass a three character string, it treats it as an iterable and returns 3 embeddings – one for each letter, as seen above. So, as a word of caution, if you wish to pass in a single item, pass it in as a list of one.

2.2 Distance Between Embeddings

When you wish to find the similarity between two separate embeddings, such as the generated embedding of your query and a stored embedding in the vector database(see Fig. 1 Step 3), we need to use a distance function. In our case, we’ll use cosine distance. Related is the cosine similarity, which describes the similarity between two vectors. It is -1 if they are not at all related, and 1 if they are pointing in the exact same direction.

cosine_distance = 1 - cosine_similarity so 0 represents a perfect relationship while 2 represents no relationship.

Putting this into practice, lets compare nodes generated from the 0th paragraph(we use zero indexing in these here parts, pahtnah) to other nodes generated from the 0th paragraph, and then compare nodes generated from the first paragraph to nodes generated from the 19th paragraph

from chromadb.utils.distance_functions import cosine

p0_list = []
p19_list = []
for v in graph_history.history.values():
    if 'nodes' in v:
        if (v['nodes']['paragraph_idx'] == 0):
            p0_list.append(v['nodes'])
        if v['nodes']['paragraph_idx'] == 19:
            p19_list.append(v['nodes'])

print('paragraph 0 - paragraph 0 comparisons: ', cosine(ef([str(p0_list[0])])[0], ef([str(p0_list[1])])[0]), cosine(ef([str(p0_list[0])])[0], ef([str(p0_list[2])])[0]),
      '\nparagraph 1 - paragraph 19 comparisons: ', cosine(ef([str(p0_list[0])])[0], ef([str(p19_list[0])])[0]), cosine(ef([str(p0_list[0])])[0], ef([str(p19_list[1])])[0]))
paragraph 0 - paragraph 0 comparisons:  0.22135839656360656 0.14691759339122346 
paragraph 1 - paragraph 19 comparisons:  0.3139882121419234 0.2947411460832846

Luckily, the values make my point for me. There is more similarity between the nodes generated from within a paragraph than between nodes generated from different paragraphs.

Figure 2: Nodes originating from a paragraph are likely to be more similar than nodes generated from different paragraphs

2.3 Setting Up Our DB

Chroma uses collections as vector spaces which handle the storage of your vectors, their ids, and metadata.

chroma_client = chromadb.Client()

collection = chroma_client.create_collection(
    name='musique_benchmark',
    embedding_function=ef,
    metadata={"hsnw:space": "cosine"}
    )

We just feed it a name – our benchmark, the embedding_function and the hsnw:space as the metadata, instructing the database which distance metric this collection should be optimized for.

3 Connecting Vector DB to Knowledge Graph

Our current goal is to retrieve the adjacent nodes(nodes with connections) of a node whose embedding is semantically similar to our query – seen in Fig. 1 Step 5.

In the previous post we created a network graph with rustworkx from a graph_history object we generated with our LLM pipeline while looping over the paragraphs of a single question.

As a brief reminder, here is what the history dictionary of that graph_history object resembles:

graph_history.history
OrderedDict([(UUID('bdbf44f6-c5ab-4e93-95f9-913c4472c483'),
              {'nodes': {'semantic_id': 'world-war-2',
                'category': 'historical-event',
                'attributes': {'name': 'Second World War',
                 'duration': '1942 - 1945',
                 'impact': 'Japan occupied the Philippines during this period'},
                'paragraph_idx': 0}}),
             (UUID('26dc571e-2d3f-405e-be4c-11be4dbb4e21'),
              {'edges': {'from_node': UUID('7c74cbd4-d37e-48ee-b07b-8e743cb4e571'),
                'to_node': UUID('bdbf44f6-c5ab-4e93-95f9-913c4472c483'),
                'category': 'affected_by'}}),
             (UUID('7c74cbd4-d37e-48ee-b07b-8e743cb4e571'),
              {'nodes': {'semantic_id': 'philippines-commonwealth',
                'category': 'government',
                'attributes': {'name': 'Commonwealth of the Philippines',
                 'duration': '1935 - 1946',
                 'status': 'replaced the Insular Government',
                 'description': 'The administrative body that governed the Philippines from 1935 to 1946, aside from a period of exile in the Second World War from 1942 to 1945 when Japan occupied the country.',
                 'established_by': 'Tydings–McDuffie Act'},
                'paragraph_idx': 0}}),
             (UUID('45454413-bccb-4305-b606-2fa6386a64b6'),
              {'nodes': {'semantic_id': 'insular-government',
                'category': 'government',
                'attributes': {'name': 'Insular Government',
                 'type': 'United States territorial government',
                 'replaced_by': 'philippines-commonwealth'},
                'paragraph_idx': 0}}),
             (UUID('fd95bdaf-d7b3-4795-b0c3-bb239fa17d0e'),
              {'nodes': {'semantic_id': 'transition-to-independence',
                'category': 'process',
                'attributes': {'name': 'Transitional administration',
                 'purpose': "in preparation for the country's full achievement of independence"},
                'paragraph_idx': 0}}),
             (UUID('d456f405-4f08-48a4-8ede-5e3c39bda952'),
              {'edges': {'from_node': UUID('45454413-bccb-4305-b606-2fa6386a64b6'),
                'to_node': UUID('7c74cbd4-d37e-48ee-b07b-8e743cb4e571'),
                'category': 'replaced'}}),
             (UUID('4c6e9ebc-b61f-4514-93dd-628f4b3efc63'),
              {'edges': {'from_node': UUID('7c74cbd4-d37e-48ee-b07b-8e743cb4e571'),
                'to_node': UUID('fd95bdaf-d7b3-4795-b0c3-bb239fa17d0e'),
                'category': 'part_of'}}),
             (UUID('b88ccc2e-89f4-4799-bf64-38ca5d0badf8'),
              {'nodes': {'semantic_id': 'lake-oesa',
                'category': 'location',
                'attributes': {'name': 'Lake Oesa',
                 'elevation': 2267,
                 'unit': 'm',
                 'elevation_ft': 7438,
                 'location': {'park': 'Yoho National Park',
                  'city': 'Field',
                  'province': 'British Columbia',
                  'country': 'Canada'}},
                'paragraph_idx': 1}}),
             (UUID('fa964c2f-3cf7-4b61-99f4-6029ace56ccb'),
              {'nodes': {'semantic_id': 'arafura-swamp',
                'category': 'location',
                'attributes': {'name': 'Arafura Swamp',
                 'type': 'largest wooded swamp in the Northern Territory and possibly in Australia',
                 'location': {'region': 'Arnhem Land',
                  'territory': 'Northern Territory',
                  'country': 'Australia'},
                 'size': {'area': {'max': 5850, 'unit': 'km^2'},
                  'expansion': 'may expand by the end of the wet season'},
                 'status': 'near pristine floodplain',
                 'cultural_significance': 'great cultural significance to the Yolngu people, in particular the Ramingining community',
                 'filming_location': 'Ten Canoes'},
                'paragraph_idx': 2}}),
             (UUID('4c1c52c9-a5a5-4bc8-999d-cf7539d57322'),
              {'nodes': {'semantic_id': 'wapizagonke-lake',
                'category': 'location',
                'attributes': {'name': 'Wapizagonke Lake',
                 'location': {'sector': 'Lac-Wapizagonke',
                  'city': 'Shawinigan',
                  'park': 'La Mauricie National Park',
                  'region': 'Mauricie',
                  'province': 'Quebec',
                  'country': 'Canada'}},
                'paragraph_idx': 3}}),
             (UUID('dc5e696f-cdb0-4ae7-bf44-cc4eef5af46d'),
              {'nodes': {'semantic_id': 'khabarovsky-district',
                'category': 'location',
                'attributes': {'name': 'Khabarovsky District',
                 'type': 'administrative and municipal district',
                 'region': 'Khabarovsk Krai',
                 'country': 'Russia',
                 'area': {'value': 45140, 'unit': 'km^2'},
                 'segments': {'description': 'two unconnected segments separated by the territory of Amursky District',
                  'location': 'southwest of the krai'},
                 'administrative_center': {'name': 'Khabarovsk'}},
                'paragraph_idx': 4}}),
             (UUID('38724952-3d65-44b0-b5f8-03d13f501b6e'),
              {'nodes': {'semantic_id': 'silver-lake',
                'category': 'location',
                'attributes': {'name': 'Silver Lake',
                 'location': {'county': 'Cheshire County',
                  'state': 'New Hampshire',
                  'region': 'southwestern',
                  'country': 'United States'},
                 'towns': ['Harrisville', 'Nelson'],
                 'water_flow': {'from': 'Silver Lake',
                  'via': ['Minnewawa Brook', 'The Branch'],
                  'to': 'Ashuelot River'}},
                'paragraph_idx': 5}}),
             (UUID('85f00f66-e471-4e29-90f1-a11634734dc9'),
              {'nodes': {'semantic_id': 'ashuelot-river',
                'category': 'location',
                'attributes': {'name': 'Ashuelot River',
                 'type': 'tributary',
                 'of': 'Connecticut River'},
                'paragraph_idx': 5}}),
             (UUID('7c057b91-d6fb-4058-af7e-6d3104e1eed1'),
              {'nodes': {'semantic_id': 'hmda',
                'category': 'organization',
                'attributes': {'name': 'Hyderabad Metropolitan Development Authority (HMDA)',
                 'type': 'apolitical urban planning agency'},
                'paragraph_idx': 6}}),
             (UUID('a0f921c0-73c3-41db-980a-c86d1bc3906b'),
              {'nodes': {'semantic_id': 'hmda-area',
                'category': 'location',
                'attributes': {'name': 'area under the Hyderabad Metropolitan Development Authority (HMDA)',
                 'size': 'largest',
                 'description': 'covers the GHMC and its suburbs, extending to 54 mandals in five districts encircling the city'},
                'paragraph_idx': 6}}),
             (UUID('50db0180-d74c-4009-8812-aad530beef99'),
              {'nodes': {'semantic_id': 'ghmc-area',
                'category': 'location',
                'attributes': {'name': 'GHMC area',
                 'alias': 'Hyderabad city',
                 'size': 'larger than Hyderabad district'},
                'paragraph_idx': 6}}),
             (UUID('c01036cf-69f0-4589-a53f-c967eb829736'),
              {'nodes': {'semantic_id': 'hyderabad-district',
                'category': 'location',
                'attributes': {'name': 'Hyderabad district',
                 'size': 'larger than Hyderabad Police area'},
                'paragraph_idx': 6}}),
             (UUID('b325368a-939a-4b13-bc90-bb48ffa68859'),
              {'nodes': {'semantic_id': 'hyderabad-police-area',
                'category': 'location',
                'attributes': {'name': 'Hyderabad Police area',
                 'size': 'smallest'},
                'paragraph_idx': 6}}),
             (UUID('85d71a58-5af2-438e-9f73-72c8363a57f8'),
              {'nodes': {'semantic_id': 'hmwssb',
                'category': 'organization',
                'attributes': {'name': 'Hyderabad Metropolitan Water Supply and Sewerage Board',
                 'description': 'bodies such as the Hyderabad Metropolitan Water Supply and Sewerage Board (HMWSSB) that HMDA manages the administration of'},
                'paragraph_idx': 6}}),
             (UUID('e11ba81b-0525-4e60-b433-2f0f4438784a'),
              {'edges': {'from_node': UUID('7c057b91-d6fb-4058-af7e-6d3104e1eed1'),
                'to_node': UUID('a0f921c0-73c3-41db-980a-c86d1bc3906b'),
                'category': 'manages'}}),
             (UUID('0dd83ca3-69c1-473a-ba60-6afc5fcf0acf'),
              {'edges': {'from_node': UUID('50db0180-d74c-4009-8812-aad530beef99'),
                'to_node': UUID('a0f921c0-73c3-41db-980a-c86d1bc3906b'),
                'category': 'part_of'}}),
             (UUID('9ff064a2-672f-4964-81c3-70316ecae3ec'),
              {'edges': {'from_node': UUID('c01036cf-69f0-4589-a53f-c967eb829736'),
                'to_node': UUID('50db0180-d74c-4009-8812-aad530beef99'),
                'category': 'part_of'}}),
             (UUID('bbd34f51-b77f-4d25-854c-52b0383ffc45'),
              {'edges': {'from_node': UUID('b325368a-939a-4b13-bc90-bb48ffa68859'),
                'to_node': UUID('c01036cf-69f0-4589-a53f-c967eb829736'),
                'category': 'part_of'}}),
             (UUID('e5d4bd92-ab18-4a9a-8479-6f840f818812'),
              {'edges': {'from_node': UUID('7c057b91-d6fb-4058-af7e-6d3104e1eed1'),
                'to_node': UUID('85d71a58-5af2-438e-9f73-72c8363a57f8'),
                'category': 'manages'}}),
             (UUID('9313d4b8-6d2e-4032-b214-b321fe5da7d3'),
              {'nodes': {'semantic_id': 'san-juan-city',
                'category': 'location',
                'attributes': {'name': 'San Juan city',
                 'size': '76.93 square miles (199.2 km²)',
                 'water_area': '29.11 square miles (75.4 km²) (37.83%)'},
                'paragraph_idx': 7}}),
             (UUID('72913ab5-4552-42e1-8604-7e016738c169'),
              {'nodes': {'semantic_id': 'san-juan-bay',
                'category': 'location',
                'attributes': {'name': 'San Juan Bay', 'type': 'water body'},
                'paragraph_idx': 7}}),
             (UUID('278b8adb-3088-4b06-9348-5cc2c9bb3fc1'),
              {'nodes': {'semantic_id': 'condado-lagoon',
                'category': 'location',
                'attributes': {'name': 'Condado Lagoon', 'type': 'water body'},
                'paragraph_idx': 7}}),
             (UUID('9814602e-d7e0-480f-b86e-4b0670e8bdf0'),
              {'nodes': {'semantic_id': 'san-jose-lagoon',
                'category': 'location',
                'attributes': {'name': 'San José Lagoon',
                 'type': 'water body'},
                'paragraph_idx': 7}}),
             (UUID('17ecb67f-e311-489c-9df7-97d8e689ec3e'),
              {'edges': {'from_node': UUID('9313d4b8-6d2e-4032-b214-b321fe5da7d3'),
                'to_node': UUID('72913ab5-4552-42e1-8604-7e016738c169'),
                'category': 'contains'}}),
             (UUID('03b81c23-907d-4f86-b12b-3fc300f8cecf'),
              {'edges': {'from_node': UUID('9313d4b8-6d2e-4032-b214-b321fe5da7d3'),
                'to_node': UUID('278b8adb-3088-4b06-9348-5cc2c9bb3fc1'),
                'category': 'contains'}}),
             (UUID('832b969e-08c7-4ad2-9f21-29e78f18f246'),
              {'edges': {'from_node': UUID('9313d4b8-6d2e-4032-b214-b321fe5da7d3'),
                'to_node': UUID('9814602e-d7e0-480f-b86e-4b0670e8bdf0'),
                'category': 'contains'}}),
             (UUID('0f3a0e4d-3a1b-4d2b-befc-676b8009d1f6'),
              {'nodes': {'semantic_id': 'landkreis',
                'category': 'location',
                'attributes': {'type': 'administrative district'},
                'paragraph_idx': 8}}),
             (UUID('b23a8544-765b-4713-a2a0-01c3f38fed6a'),
              {'nodes': {'semantic_id': 'urban-hinterland',
                'category': 'location',
                'attributes': {'type': 'area surrounding a district-free city or town'},
                'paragraph_idx': 8}}),
             (UUID('a0c2f429-8515-4075-b051-af1985e1ac5c'),
              {'nodes': {'semantic_id': 'district-level',
                'category': 'location',
                'attributes': {'type': 'administrative level'},
                'paragraph_idx': 8}}),
             (UUID('5447372d-faa0-4578-a285-d28ccd556385'),
              {'nodes': {'semantic_id': 'kreisfreie-stadt',
                'category': 'location',
                'attributes': {'type': 'district-free city or town'},
                'paragraph_idx': 8}}),
             (UUID('37176eec-a55d-44e4-a6cb-8bfbfee933fa'),
              {'nodes': {'semantic_id': 'local-associations',
                'category': 'organization',
                'attributes': {'type': 'amalgamation of one or more Landkreise with one or more Kreisfreie Städte',
                 'purpose': 'to implement simplification of administration at the district level'},
                'paragraph_idx': 8}}),
             (UUID('50b36c1c-0ca2-4111-a97e-258fdfb02e39'),
              {'edges': {'from_node': UUID('5447372d-faa0-4578-a285-d28ccd556385'),
                'to_node': UUID('b23a8544-765b-4713-a2a0-01c3f38fed6a'),
                'category': 'associated_with'}}),
             (UUID('ff2e2d26-2cd5-4257-abf8-c10e7df94c02'),
              {'edges': {'from_node': UUID('37176eec-a55d-44e4-a6cb-8bfbfee933fa'),
                'to_node': UUID('a0c2f429-8515-4075-b051-af1985e1ac5c'),
                'category': 'operates_at'}}),
             (UUID('cc430a8c-0f62-4bcc-ad2c-4d1f7ce5f6d5'),
              {'edges': {'from_node': UUID('37176eec-a55d-44e4-a6cb-8bfbfee933fa'),
                'to_node': UUID('5447372d-faa0-4578-a285-d28ccd556385'),
                'category': 'contains'}}),
             (UUID('6949c464-7c2f-4b0f-962b-3bce37d17ddc'),
              {'edges': {'from_node': UUID('37176eec-a55d-44e4-a6cb-8bfbfee933fa'),
                'to_node': UUID('0f3a0e4d-3a1b-4d2b-befc-676b8009d1f6'),
                'category': 'contains'}}),
             (UUID('4e3d9a61-a794-49f8-bafc-b9b64fec2fe6'),
              {'nodes': {'semantic_id': 'norfolk-island',
                'category': 'location',
                'attributes': {'name': 'Norfolk Island',
                 'type': 'island',
                 'coordinates': {'latitude': -29.033, 'longitude': 167.95},
                 'location': 'South Pacific Ocean, east of the Australian mainland',
                 'area': 34.6,
                 'area_unit': 'square kilometres',
                 'coastline': 32,
                 'coastline_unit': 'km',
                 'highest_point': 'Mount Bates'},
                'paragraph_idx': 9}}),
             (UUID('a7bf1182-a4f6-4a73-89ce-0a00be00e2cb'),
              {'nodes': {'semantic_id': 'phillip-island',
                'category': 'location',
                'attributes': {'name': 'Phillip Island',
                 'type': 'island',
                 'location': 'territory of Norfolk Island',
                 'size': 'second largest island'},
                'paragraph_idx': 9}}),
             (UUID('19b0c9e8-39b6-4dde-b822-d032cf5e63ba'),
              {'nodes': {'semantic_id': 'mount-bates',
                'category': 'location',
                'attributes': {'name': 'Mount Bates',
                 'elevation': 319,
                 'elevation_unit': 'metres',
                 'location': 'northwest quadrant of Norfolk Island'},
                'paragraph_idx': 9}}),
             (UUID('b21b44eb-f998-495d-9eb4-df1057959cf3'),
              {'nodes': {'semantic_id': 'phillip-island-distance',
                'category': 'distance',
                'attributes': {'distance': 7,
                 'distance_unit': 'kilometres',
                 'direction': 'south',
                 'reference_location': 'main island'},
                'paragraph_idx': 9}}),
             (UUID('e698d0e2-1709-4262-97d6-196bd98cc2d3'),
              {'edges': {'from_node': UUID('4e3d9a61-a794-49f8-bafc-b9b64fec2fe6'),
                'to_node': UUID('19b0c9e8-39b6-4dde-b822-d032cf5e63ba'),
                'category': 'contains'}}),
             (UUID('04e22c46-04cf-4f73-a4f1-39f7fc998ec3'),
              {'edges': {'from_node': UUID('4e3d9a61-a794-49f8-bafc-b9b64fec2fe6'),
                'to_node': UUID('a7bf1182-a4f6-4a73-89ce-0a00be00e2cb'),
                'category': 'contains'}}),
             (UUID('cee43dbe-2d89-44ed-a1fa-0eafa878167a'),
              {'edges': {'from_node': UUID('4e3d9a61-a794-49f8-bafc-b9b64fec2fe6'),
                'to_node': UUID('b21b44eb-f998-495d-9eb4-df1057959cf3'),
                'category': 'contains'}}),
             (UUID('5f092031-cf0d-408c-a4f1-896e7c8607be'),
              {'nodes': {'semantic_id': 'star-stadium',
                'category': 'location',
                'attributes': {'name': 'Star (Zvezda) Stadium',
                 'previous_name': 'Lenin Komsomol Stadium',
                 'location': 'Perm, Russia',
                 'type': 'multi-use stadium',
                 'usage': 'football matches',
                 'home_of': 'FC Amkar Perm',
                 'capacity': 17000,
                 'opened_on': '1969-06-05'},
                'paragraph_idx': 11}}),
             (UUID('08177c86-f5f7-4917-8a05-c1f311690aee'),
              {'nodes': {'semantic_id': 'perm',
                'category': 'location',
                'attributes': {'name': 'Perm',
                 'type': 'city',
                 'administrative_center': 'Perm Krai'},
                'paragraph_idx': 11}}),
             (UUID('08b31a77-8dc6-4490-925c-037ebf0b8d13'),
              {'nodes': {'semantic_id': 'perm-krai',
                'category': 'location',
                'attributes': {'name': 'Perm Krai',
                 'type': 'administrative region',
                 'location': 'Russia'},
                'paragraph_idx': 11}}),
             (UUID('725b2459-e361-40e7-a5c5-463c81aaed93'),
              {'edges': {'from_node': UUID('08177c86-f5f7-4917-8a05-c1f311690aee'),
                'to_node': UUID('08b31a77-8dc6-4490-925c-037ebf0b8d13'),
                'category': 'administrative_center_of'}}),
             (UUID('91b73ee9-b9f7-4858-b7bc-c761f5e8b4b5'),
              {'nodes': {'semantic_id': 'papeete',
                'category': 'location',
                'attributes': {'name': 'Papeete',
                 'type': 'city',
                 'location': 'French Polynesia'},
                'paragraph_idx': 12}}),
             (UUID('b8ed0c63-f06f-408a-832a-93d8b0a02d8e'),
              {'nodes': {'semantic_id': 'french-polynesia',
                'category': 'location',
                'attributes': {'name': 'French Polynesia',
                 'type': 'overseas territory',
                 'location': 'South Pacific Ocean'},
                'paragraph_idx': 12}}),
             (UUID('604a99d8-6569-4fbb-a44d-c9e446100111'),
              {'nodes': {'semantic_id': 'tahiti',
                'category': 'location',
                'attributes': {'name': 'Tahiti',
                 'type': 'island',
                 'part_of': 'Society Islands'},
                'paragraph_idx': 12}}),
             (UUID('cf0355f5-9078-4d93-8434-961fee590a47'),
              {'nodes': {'semantic_id': 'windward-islands',
                'category': 'location',
                'attributes': {'name': 'Windward Islands',
                 'type': 'administrative subdivision',
                 'part_of': 'Society Islands'},
                'paragraph_idx': 12}}),
             (UUID('202c4602-aa18-4147-92a8-a3c6344a048b'),
              {'edges': {'from_node': UUID('0f802399-f15e-442d-a597-e0c84972a35f'),
                'to_node': UUID('91b73ee9-b9f7-4858-b7bc-c761f5e8b4b5'),
                'category': 'located_in'}}),
             (UUID('4ac76cfb-455e-4987-bfe0-2f6b1416bb49'),
              {'edges': {'from_node': UUID('0f802399-f15e-442d-a597-e0c84972a35f'),
                'to_node': UUID('604a99d8-6569-4fbb-a44d-c9e446100111'),
                'category': 'located_on'}}),
             (UUID('aafe3fb3-ee59-4d8c-b299-d0a9776878e7'),
              {'edges': {'from_node': UUID('0f802399-f15e-442d-a597-e0c84972a35f'),
                'to_node': UUID('cf0355f5-9078-4d93-8434-961fee590a47'),
                'category': 'part_of'}}),
             (UUID('2ca5031f-d739-4910-8b2e-574ee94b4fc3'),
              {'edges': {'from_node': UUID('604a99d8-6569-4fbb-a44d-c9e446100111'),
                'to_node': UUID('d2974cd5-7054-4bed-8222-0d92af1f90a6'),
                'category': 'part_of'}}),
             (UUID('ccd6454e-3a6a-4404-a2fa-f940a0039967'),
              {'edges': {'from_node': UUID('cf0355f5-9078-4d93-8434-961fee590a47'),
                'to_node': UUID('d2974cd5-7054-4bed-8222-0d92af1f90a6'),
                'category': 'part_of'}}),
             (UUID('e650982f-8436-491a-8f3e-1a1124550783'),
              {'edges': {'from_node': UUID('b8ed0c63-f06f-408a-832a-93d8b0a02d8e'),
                'to_node': UUID('604a99d8-6569-4fbb-a44d-c9e446100111'),
                'category': 'contains'}}),
             (UUID('f45c0231-3ade-44e5-9bb7-7722cca223f2'),
              {'edges': {'from_node': UUID('b8ed0c63-f06f-408a-832a-93d8b0a02d8e'),
                'to_node': UUID('0f802399-f15e-442d-a597-e0c84972a35f'),
                'category': 'contains'}}),
             (UUID('d2974cd5-7054-4bed-8222-0d92af1f90a6'),
              {'nodes': {'semantic_id': 'society-islands',
                'category': 'location',
                'attributes': {'name': 'Society Islands',
                 'type': 'archipelago'},
                'paragraph_idx': 12}}),
             (UUID('0f802399-f15e-442d-a597-e0c84972a35f'),
              {'nodes': {'semantic_id': 'paea',
                'category': 'location',
                'attributes': {'name': 'Paea',
                 'type': 'commune',
                 'location': 'Papeete, French Polynesia',
                 'island': 'Tahiti',
                 'administrative_subdivision': 'Windward Islands',
                 'part_of': 'Society Islands'},
                'paragraph_idx': 12}}),
             (UUID('78c1c3dc-5f26-4176-830a-c0586e583955'),
              {'nodes': {'semantic_id': 'population',
                'category': 'attribute',
                'attributes': {'value': 13021, 'year': 2017},
                'paragraph_idx': 12}}),
             (UUID('935abcd2-c7ec-4815-8026-14d9e8502b3e'),
              {'edges': {'from_node': UUID('0f802399-f15e-442d-a597-e0c84972a35f'),
                'to_node': UUID('d2974cd5-7054-4bed-8222-0d92af1f90a6'),
                'category': 'part_of'}}),
             (UUID('2d09622d-df53-4f41-9707-24fc6698438c'),
              {'edges': {'from_node': UUID('0f802399-f15e-442d-a597-e0c84972a35f'),
                'to_node': UUID('78c1c3dc-5f26-4176-830a-c0586e583955'),
                'category': 'has_attribute'}}),
             (UUID('710a0044-84bc-4f53-a31c-1fb07c009351'),
              {'nodes': {'semantic_id': 'potamogeton-amplifolius',
                'category': 'plant',
                'attributes': {'name': 'Potamogeton amplifolius',
                 'common_names': ['largeleaf pondweed',
                  'broad-leaved pondweed'],
                 'description': 'an aquatic plant of North America',
                 'habitat': ['lakes', 'ponds', 'rivers'],
                 'environment': 'deep water'},
                'paragraph_idx': 13}}),
             (UUID('4e1a454c-a0c5-4851-a0a0-092123232ce0'),
              {'nodes': {'semantic_id': 'north-america',
                'category': 'geographic_region',
                'attributes': {'name': 'North America'},
                'paragraph_idx': 13}}),
             (UUID('9f17363c-01e0-4698-b96d-9425f00cd08b'),
              {'edges': {'from_node': UUID('710a0044-84bc-4f53-a31c-1fb07c009351'),
                'to_node': UUID('4e1a454c-a0c5-4851-a0a0-092123232ce0'),
                'category': 'native_to'}}),
             (UUID('417a9a71-8b2b-43f7-9d04-7905c9d00075'),
              {'nodes': {'semantic_id': 'soltonsky-district',
                'category': 'location',
                'attributes': {'name': 'Soltonsky District',
                 'type': 'administrative and municipal district',
                 'part_of': 'Altai Krai',
                 'location': 'Russia',
                 'borders': ['Biysky District']},
                'paragraph_idx': 14}}),
             (UUID('4b288fc3-091b-447f-b1e7-31736281589c'),
              {'nodes': {'semantic_id': 'krasnogorsky-district',
                'category': 'location',
                'attributes': {'name': 'Krasnogorsky District',
                 'type': 'administrative and municipal district',
                 'part_of': 'Altai Krai',
                 'location': 'Russia',
                 'borders': ['Biysky District']},
                'paragraph_idx': 14}}),
             (UUID('a2849f7d-8ccd-4c95-8911-29b2f498d6b7'),
              {'nodes': {'semantic_id': 'sovetsky-district',
                'category': 'location',
                'attributes': {'name': 'Sovetsky District',
                 'type': 'administrative and municipal district',
                 'part_of': 'Altai Krai',
                 'location': 'Russia',
                 'borders': ['Biysky District']},
                'paragraph_idx': 14}}),
             (UUID('b63a7d4f-9757-44f2-90c1-d29ef9d6c62a'),
              {'nodes': {'semantic_id': 'smolensky-district',
                'category': 'location',
                'attributes': {'name': 'Smolensky District',
                 'type': 'administrative and municipal district',
                 'part_of': 'Altai Krai',
                 'location': 'Russia',
                 'borders': ['Biysky District']},
                'paragraph_idx': 14}}),
             (UUID('b980a3e8-8022-485f-addf-9c8d68716740'),
              {'nodes': {'semantic_id': 'biysk',
                'category': 'location',
                'attributes': {'name': 'Biysk',
                 'type': 'city',
                 'part_of': 'Altai Krai',
                 'location': 'Russia',
                 'is_administrative_center': True},
                'paragraph_idx': 14}}),
             (UUID('10b5e6c7-bc63-41d5-b52d-38e728f085e2'),
              {'nodes': {'semantic_id': 'biysky-district',
                'category': 'location',
                'attributes': {'name': 'Biysky District',
                 'type': 'administrative and municipal district',
                 'part_of': 'Altai Krai',
                 'location': 'Russia',
                 'borders': ['Soltonsky District',
                  'Krasnogorsky District',
                  'Sovetsky District',
                  'Smolensky District',
                  'Biysk']},
                'paragraph_idx': 14}}),
             (UUID('8f0836c4-fff6-4fde-b259-f70b6128b348'),
              {'nodes': {'semantic_id': 'altai-krai',
                'category': 'location',
                'attributes': {'name': 'Altai Krai',
                 'type': 'krai',
                 'location': 'Russia'},
                'paragraph_idx': 14}}),
             (UUID('bab321a0-3f98-426b-8a2d-20d5ec639232'),
              {'edges': {'from_node': UUID('417a9a71-8b2b-43f7-9d04-7905c9d00075'),
                'to_node': UUID('10b5e6c7-bc63-41d5-b52d-38e728f085e2'),
                'category': 'borders'}}),
             (UUID('cbd9207d-ab1e-4c0d-b63b-a4181fbace46'),
              {'edges': {'from_node': UUID('4b288fc3-091b-447f-b1e7-31736281589c'),
                'to_node': UUID('10b5e6c7-bc63-41d5-b52d-38e728f085e2'),
                'category': 'borders'}}),
             (UUID('cb5c0d19-7b12-4a50-a477-cac8e0762a12'),
              {'edges': {'from_node': UUID('a2849f7d-8ccd-4c95-8911-29b2f498d6b7'),
                'to_node': UUID('10b5e6c7-bc63-41d5-b52d-38e728f085e2'),
                'category': 'borders'}}),
             (UUID('e40f26ca-a97c-46ca-a9ed-a031a8088448'),
              {'edges': {'from_node': UUID('b63a7d4f-9757-44f2-90c1-d29ef9d6c62a'),
                'to_node': UUID('10b5e6c7-bc63-41d5-b52d-38e728f085e2'),
                'category': 'borders'}}),
             (UUID('a09750f0-d5f8-4dab-82cc-be2e73691c97'),
              {'edges': {'from_node': UUID('b980a3e8-8022-485f-addf-9c8d68716740'),
                'to_node': UUID('10b5e6c7-bc63-41d5-b52d-38e728f085e2'),
                'category': 'borders'}}),
             (UUID('68b15747-9269-48bf-ba9d-ca79b31ba192'),
              {'edges': {'from_node': UUID('10b5e6c7-bc63-41d5-b52d-38e728f085e2'),
                'to_node': UUID('8f0836c4-fff6-4fde-b259-f70b6128b348'),
                'category': 'part_of'}}),
             (UUID('4e30cf7e-8223-4337-a65d-0c0c108e674e'),
              {'nodes': {'semantic_id': 'contoocook-lake',
                'category': 'location',
                'attributes': {'name': 'Contoocook Lake',
                 'type': 'lake',
                 'location': {'county': 'Cheshire County',
                  'state': 'New Hampshire',
                  'country': 'United States'},
                 'towns': ['Jaffrey', 'Rindge'],
                 'is_headwaters_of': 'contoocook-river'},
                'paragraph_idx': 15}}),
             (UUID('ada27367-85fa-47e6-9d1c-1f061b2e1dc6'),
              {'nodes': {'semantic_id': 'pool-pond',
                'category': 'location',
                'attributes': {'name': 'Pool Pond',
                 'type': 'pond',
                 'location': {'county': 'Cheshire County',
                  'state': 'New Hampshire',
                  'country': 'United States'},
                 'is_headwaters_of': 'contoocook-river'},
                'paragraph_idx': 15}}),
             (UUID('f39afbdf-aded-4076-b3fa-40f7541b98e6'),
              {'edges': {'from_node': UUID('4e30cf7e-8223-4337-a65d-0c0c108e674e'),
                'to_node': UUID('65cf9d75-79e6-46a5-aab9-ede4587f8e63'),
                'category': 'flows_into'}}),
             (UUID('e7f20049-4bf1-49ca-b9c9-3061af2aa15c'),
              {'edges': {'from_node': UUID('ada27367-85fa-47e6-9d1c-1f061b2e1dc6'),
                'to_node': UUID('65cf9d75-79e6-46a5-aab9-ede4587f8e63'),
                'category': 'flows_into'}}),
             (UUID('65cf9d75-79e6-46a5-aab9-ede4587f8e63'),
              {'nodes': {'semantic_id': 'contoocook-river',
                'category': 'location',
                'attributes': {'name': 'Contoocook River',
                 'type': 'river',
                 'flows_to': 'merrimack-river',
                 'flows_from': ['contoocook-lake', 'pool-pond'],
                 'location': {'city': 'Penacook',
                  'state': 'New Hampshire',
                  'country': 'United States'}},
                'paragraph_idx': 15}}),
             (UUID('9529f921-1f99-4ff7-b9a1-cdce2ff80f92'),
              {'nodes': {'semantic_id': 'merrimack-river',
                'category': 'location',
                'attributes': {'name': 'Merrimack River',
                 'type': 'river',
                 'location': {'city': 'Penacook',
                  'state': 'New Hampshire',
                  'country': 'United States'}},
                'paragraph_idx': 15}}),
             (UUID('c7467973-16e6-4395-8ef9-420c0d1227aa'),
              {'edges': {'from_node': UUID('65cf9d75-79e6-46a5-aab9-ede4587f8e63'),
                'to_node': UUID('9529f921-1f99-4ff7-b9a1-cdce2ff80f92'),
                'category': 'flows_into'}}),
             (UUID('ad00a8de-78fe-4c90-87ec-97495cc749a3'),
              {'nodes': {'semantic_id': 'bogota',
                'category': 'location',
                'attributes': {'name': 'Bogotá',
                 'type': 'city',
                 'pronunciation': {'en': ['ˈboʊɡəˌtɑː', 'bɒˈɡoʊtə', 'boɪ -'],
                  'es': 'boɣoˈta'},
                 'official_name': 'Bogotá',
                 'role': 'political, economic, administrative, industrial, artistic, cultural, and sports center'},
                'paragraph_idx': 16}}),
             (UUID('57a45a77-0a10-4b3c-aa0f-bb090c7df86a'),
              {'nodes': {'semantic_id': 'colombia',
                'category': 'location',
                'attributes': {'name': 'Colombia',
                 'type': 'country',
                 'capital': 'Bogotá'},
                'paragraph_idx': 16}}),
             (UUID('2ef968e0-18eb-4cc5-8e40-85f944b23b16'),
              {'edges': {'from_node': UUID('ad00a8de-78fe-4c90-87ec-97495cc749a3'),
                'to_node': UUID('57a45a77-0a10-4b3c-aa0f-bb090c7df86a'),
                'category': 'capital_of'}}),
             (UUID('a20af2cd-8a3a-4fb9-bdb3-f0c2934ec84d'),
              {'nodes': {'semantic_id': 'intracellular-fluid',
                'category': 'substance',
                'attributes': {'name': 'Intracellular fluid',
                 'amount_of_total_body_water': 0.625,
                 'amount_in_liters': 25,
                 'percentage_of_total_body_fluid': 62.5},
                'paragraph_idx': 17}}),
             (UUID('6a7ca929-c452-4102-92d3-696b92a385d0'),
              {'edges': {'from_node': UUID('a20af2cd-8a3a-4fb9-bdb3-f0c2934ec84d'),
                'to_node': UUID('3650ecee-07ad-4858-8d38-4d4cc2703da5'),
                'category': 'part_of'}}),
             (UUID('8994bac8-1a03-45a7-9e76-e2125548d7c9'),
              {'nodes': {'semantic_id': 'intracellular-fluid',
                'category': 'substance',
                'paragraph_idx': 19}}),
             (UUID('3650ecee-07ad-4858-8d38-4d4cc2703da5'),
              {'nodes': {'semantic_id': 'total-body-fluid',
                'category': 'substance',
                'attributes': {'amount_in_liters': 40,
                 'total_body_weight': 72},
                'paragraph_idx': 19}}),
             (UUID('73813ec0-5dbc-4793-8468-420bed7a3cd1'),
              {'nodes': {'semantic_id': 'territorial-waters',
                'category': 'location',
                'attributes': {'name': 'Territorial waters',
                 'type': 'sea',
                 'definition': 'a belt of coastal waters extending at most 12 nautical miles (22.2 km; 13.8 mi) from the baseline (usually the mean low - water mark) of a coastal state'},
                'paragraph_idx': 19}}),
             (UUID('84bcbbd4-0d38-42c3-9aa9-3b38227c0433'),
              {'nodes': {'semantic_id': 'law-of-the-sea',
                'category': 'legal_framework',
                'attributes': {'name': 'United Nations Convention on the Law of the Sea',
                 'year': 1982},
                'paragraph_idx': 19}}),
             (UUID('bc1c5af9-c311-4e9f-975d-349d33d41a15'),
              {'nodes': {'semantic_id': 'straits',
                'category': 'location',
                'attributes': {'name': 'Straits',
                 'type': 'body of water',
                 'sovereignty': 'This sovereignty extends to the airspace over and seabed below'},
                'paragraph_idx': 19}}),
             (UUID('72499573-c0cf-4a31-840a-da1dd1bf4c2f'),
              {'nodes': {'semantic_id': 'maritime-delimitation',
                'category': 'process',
                'attributes': {'name': 'Maritime delimitation',
                 'description': 'Adjustment of the boundaries of territorial waters and exclusive economic zones'},
                'paragraph_idx': 19}}),
             (UUID('cc1d4304-6583-4c76-9856-75f41f945f20'),
              {'nodes': {'semantic_id': 'bank-of-cyprus',
                'category': 'organization',
                'attributes': {'name': 'Bank of Cyprus',
                 'description': 'Largest banking group in Cyprus',
                 'relationship': "merged with the 'good' Cypriot part of Cyprus Popular Bank"},
                'paragraph_idx': 19}}),
             (UUID('1f97c243-fa43-4380-af8f-4404c5c5ce24'),
              {'edges': {'from_node': UUID('73813ec0-5dbc-4793-8468-420bed7a3cd1'),
                'to_node': UUID('84bcbbd4-0d38-42c3-9aa9-3b38227c0433'),
                'category': 'defined_by'}}),
             (UUID('238e98a5-a6be-4048-adf2-f1a310fd78df'),
              {'edges': {'from_node': UUID('bc1c5af9-c311-4e9f-975d-349d33d41a15'),
                'to_node': UUID('72499573-c0cf-4a31-840a-da1dd1bf4c2f'),
                'category': 'defined_by'}}),
             (UUID('51aade3e-39c7-41f8-87b2-38b9b595d6d3'),
              {'edges': {'from_node': UUID('7540bcc4-0cb2-463c-9250-40aadee08d65'),
                'to_node': UUID('cc1d4304-6583-4c76-9856-75f41f945f20'),
                'category': 'merged_with'}}),
             (UUID('8d3a66e1-85c3-4cf9-a995-9eb0c91c2510'),
              {'edges': {'from_node': UUID('8994bac8-1a03-45a7-9e76-e2125548d7c9'),
                'to_node': UUID('3650ecee-07ad-4858-8d38-4d4cc2703da5'),
                'category': 'part_of'}}),
             (UUID('7540bcc4-0cb2-463c-9250-40aadee08d65'),
              {'nodes': {'semantic_id': 'cyprus-popular-bank',
                'category': 'organization',
                'attributes': {'name': 'Cyprus Popular Bank',
                 'previous_names': ['Marfin Popular Bank'],
                 'status': 'shuttered in March 2013',
                 'description': 'Second largest banking group in Cyprus behind the Bank of Cyprus until 2013'},
                'paragraph_idx': 19}}),
             (UUID('b9ef6b09-848c-4ee3-94b8-2a0dbb4e88bb'),
              {'edges': {'from_node': UUID('7540bcc4-0cb2-463c-9250-40aadee08d65'),
                'to_node': UUID('7b7ab7bd-43ab-4a85-a863-587230a60496'),
                'category': 'owns'}}),
             (UUID('d4c1831d-40c6-44ad-ba34-a8945f1ddfb3'),
              {'nodes': {'semantic_id': 'central-bank-of-cyprus',
                'category': 'organization',
                'attributes': {'name': 'Central Bank of Cyprus',
                 'description': 'The central bank that amended the lawyers of the legacy entity without consulting the special administrator'},
                'paragraph_idx': 19}}),
             (UUID('ddeb860c-8a26-4602-81b2-80c3a81058b9'),
              {'edges': {'from_node': UUID('883c8cec-7a77-44ff-9d5d-0aefda29ce21'),
                'to_node': UUID('7b7ab7bd-43ab-4a85-a863-587230a60496'),
                'category': 'manages'}}),
             (UUID('edad1ce6-fc15-4785-8188-9be0a9e2fb30'),
              {'edges': {'from_node': UUID('d4c1831d-40c6-44ad-ba34-a8945f1ddfb3'),
                'to_node': UUID('7b7ab7bd-43ab-4a85-a863-587230a60496'),
                'category': 'amended_lawyers'}}),
             (UUID('7b7ab7bd-43ab-4a85-a863-587230a60496'),
              {'nodes': {'semantic_id': 'legacy-entity',
                'category': 'organization',
                'attributes': {'name': 'Legacy entity of Cyprus Popular Bank',
                 'description': 'Holds all the overseas operations of the now defunct Cyprus Popular Bank, until they are sold by the Special Administrator'},
                'paragraph_idx': 19}}),
             (UUID('3b46a61b-149b-4102-afd7-da851e1dda5c'),
              {'nodes': {'semantic_id': 'veteran-banker',
                'category': 'person',
                'attributes': {'name': 'Chris Pavlou',
                 'expertise': 'expert in Treasury'},
                'paragraph_idx': 19}}),
             (UUID('883c8cec-7a77-44ff-9d5d-0aefda29ce21'),
              {'nodes': {'semantic_id': 'special-administrator',
                'category': 'person',
                'attributes': {'name': 'Andri Antoniadou',
                 'position': 'ran the legacy entity of Cyprus Popular Bank for two years, from March 2013 until 3 March 2015'},
                'paragraph_idx': 19}}),
             (UUID('f9641a28-4283-4cdd-92cd-391f7888e7f4'),
              {'nodes': {'semantic_id': 'marfin-investment-group',
                'category': 'organization',
                'attributes': {'name': 'Marfin Investment Group',
                 'relationship': 'former major shareholder of the legacy entity'},
                'paragraph_idx': 19}}),
             (UUID('5b9c4ae3-5301-41ec-b7eb-73da084dd865'),
              {'edges': {'from_node': UUID('3b46a61b-149b-4102-afd7-da851e1dda5c'),
                'to_node': UUID('7b7ab7bd-43ab-4a85-a863-587230a60496'),
                'category': 'took_over_as'}}),
             (UUID('76e5ee79-b64a-4624-b350-1e01a6753954'),
              {'edges': {'from_node': UUID('7b7ab7bd-43ab-4a85-a863-587230a60496'),
                'to_node': UUID('f9641a28-4283-4cdd-92cd-391f7888e7f4'),
                'category': 'pursuing_legal_action_against'}})])

It consists of nodes and edges, each with a unique identifier(UUID). To build our vector store collection, we simply loop over it and add the documents and ids associated with the nodes into the collection we created earlier.

for k,v in graph_history.history.items():
    if 'nodes' in v:
        collection.add(documents=[str(v['nodes'])], ids=[str(k)])

Surely, it has to be harder than that? Nope. The documents, AKA Python dictionaries we converted to strings, we added were converted using the Sentence Transformer used in the create_collection() command. Lets take it for a spin and query it with a question.

top_results = collection.query(
    query_texts=["What is the largest island in the pacific?"]
    )
top_results['documents'], top_results['ids']
(["{'semantic_id': 'norfolk-island', 'category': 'location', 'attributes': {'name': 'Norfolk Island', 'type': 'island', 'coordinates': {'latitude': -29.033, 'longitude': 167.95}, 'location': 'South Pacific Ocean, east of the Australian mainland', 'area': 34.6, 'area_unit': 'square kilometres', 'coastline': 32, 'coastline_unit': 'km', 'highest_point': 'Mount Bates'}, 'paragraph_idx': 9}",
  "{'semantic_id': 'phillip-island', 'category': 'location', 'attributes': {'name': 'Phillip Island', 'type': 'island', 'location': 'territory of Norfolk Island', 'size': 'second largest island'}, 'paragraph_idx': 9}",
  "{'semantic_id': 'tahiti', 'category': 'location', 'attributes': {'name': 'Tahiti', 'type': 'island', 'part_of': 'Society Islands'}, 'paragraph_idx': 12}",
  "{'semantic_id': 'territorial-waters', 'category': 'location', 'attributes': {'name': 'Territorial waters', 'type': 'sea', 'definition': 'a belt of coastal waters extending at most 12 nautical miles (22.2 km; 13.8 mi) from the baseline (usually the mean low - water mark) of a coastal state'}, 'paragraph_idx': 19}",
  "{'semantic_id': 'french-polynesia', 'category': 'location', 'attributes': {'name': 'French Polynesia', 'type': 'overseas territory', 'location': 'South Pacific Ocean'}, 'paragraph_idx': 12}",
  "{'semantic_id': 'san-juan-city', 'category': 'location', 'attributes': {'name': 'San Juan city', 'size': '76.93 square miles (199.2 km²)', 'water_area': '29.11 square miles (75.4 km²) (37.83%)'}, 'paragraph_idx': 7}",
  "{'semantic_id': 'phillip-island-distance', 'category': 'distance', 'attributes': {'distance': 7, 'distance_unit': 'kilometres', 'direction': 'south', 'reference_location': 'main island'}, 'paragraph_idx': 9}",
  "{'semantic_id': 'paea', 'category': 'location', 'attributes': {'name': 'Paea', 'type': 'commune', 'location': 'Papeete, French Polynesia', 'island': 'Tahiti', 'administrative_subdivision': 'Windward Islands', 'part_of': 'Society Islands'}, 'paragraph_idx': 12}",
  "{'semantic_id': 'straits', 'category': 'location', 'attributes': {'name': 'Straits', 'type': 'body of water', 'sovereignty': 'This sovereignty extends to the airspace over and seabed below'}, 'paragraph_idx': 19}",
  "{'semantic_id': 'arafura-swamp', 'category': 'location', 'attributes': {'name': 'Arafura Swamp', 'type': 'largest wooded swamp in the Northern Territory and possibly in Australia', 'location': {'region': 'Arnhem Land', 'territory': 'Northern Territory', 'country': 'Australia'}, 'size': {'area': {'max': 5850, 'unit': 'km^2'}, 'expansion': 'may expand by the end of the wet season'}, 'status': 'near pristine floodplain', 'cultural_significance': 'great cultural significance to the Yolngu people, in particular the Ramingining community', 'filming_location': 'Ten Canoes'}, 'paragraph_idx': 2}"],
 ['4e3d9a61-a794-49f8-bafc-b9b64fec2fe6',
  'a7bf1182-a4f6-4a73-89ce-0a00be00e2cb',
  '604a99d8-6569-4fbb-a44d-c9e446100111',
  '73813ec0-5dbc-4793-8468-420bed7a3cd1',
  'b8ed0c63-f06f-408a-832a-93d8b0a02d8e',
  '9313d4b8-6d2e-4032-b214-b321fe5da7d3',
  'b21b44eb-f998-495d-9eb4-df1057959cf3',
  '0f802399-f15e-442d-a597-e0c84972a35f',
  'bc1c5af9-c311-4e9f-975d-349d33d41a15',
  'fa964c2f-3cf7-4b61-99f4-6029ace56ccb'])

The top 10 results are returned based on the cosine distance metric specified, once again, in the create_collection() command. To illustrate the process up till here, have a flowchart.

Population and querying of the vector database

Population and querying of the vector database

Now, going back to the graph we created – digraph – we are able to use the UUID of the top result to do a dictionary lookup in the node_indices mapping we created when building the graph that maps from the UUID to the index of the node within the graph.

import uuid

# Constructing a UUID version 4 from a string
uuid_str = top_results['ids'][0][0]
top_uuid = uuid.UUID(uuid_str, version=4)

top_node = digraph[node_indices[top_uuid]]
top_node
{'semantic_id': 'norfolk-island',
 'category': 'location',
 'attributes': {'name': 'Norfolk Island',
  'type': 'island',
  'coordinates': {'latitude': -29.033, 'longitude': 167.95},
  'location': 'South Pacific Ocean, east of the Australian mainland',
  'area': 34.6,
  'area_unit': 'square kilometres',
  'coastline': 32,
  'coastline_unit': 'km',
  'highest_point': 'Mount Bates'},
 'paragraph_idx': 9}

Cool, so what? We already have the node dictionary. Because we have the node’s location in the graph, we can easily query the graph to get any nodes connected to it: its neighbors.

for idx in digraph.neighbors(node_indices[top_uuid]):
    print(digraph[idx])
{'semantic_id': 'mount-bates', 'category': 'location', 'attributes': {'name': 'Mount Bates', 'elevation': 319, 'elevation_unit': 'metres', 'location': 'northwest quadrant of Norfolk Island'}, 'paragraph_idx': 9}
{'semantic_id': 'phillip-island-distance', 'category': 'distance', 'attributes': {'distance': 7, 'distance_unit': 'kilometres', 'direction': 'south', 'reference_location': 'main island'}, 'paragraph_idx': 9}
{'semantic_id': 'phillip-island', 'category': 'location', 'attributes': {'name': 'Phillip Island', 'type': 'island', 'location': 'territory of Norfolk Island', 'size': 'second largest island'}, 'paragraph_idx': 9}

4 Ze end

See, promised I’d keep it short and sweet.

Next up, we’ll focus on using our vector database and knowledge graph to not only answer questions, but also cite the paragraphs with contributing evidence – at least that’s the plan.

Part Four >>>