RAG part 1

Posted on March 6, 2025March 6, 2025 by Inuka G Posted in Uncategorized

Simple RAG as a service

This a sample WIP rag project.

This RAG code interfaces with a Cosmos DB setup in azure. The Cosmos DB is used to store the vector data of embeddings.

This project has a simple vueue.js front end project. The front end allows the user to upload a PDF embeddings to cosmos DB. The front also allows the user to ask questions in regards to the PDF against an LLM.

The processing is performed using a Rest service implemented using an azure function. No llm framework such as lang chain is used in this case.

The project works as follows.

Upload of File

-User uploads file. -File is chunked and submitted to LLM to retrieve embeddings. -Embeddings are store in a cosmos DB DB in azure.

Chat against PDF.

-The system prompt is setup such that the embeddings that are passed are used to determine the answer.

The user asks a question.
The question is used to extract relevant embeddings using cartesian lookup.
The embeddings are passed along with the question to the LLM (in this case chatgpt).
The LLM then provides an answer using the embeddings.

Enhancement in progress.

-Google auth to allow multiple user to upload. Currently the db does have a user field but it is unused. -Allow user to select PDF files searched. Also save PDF file description. -Hybrid search and multi stage pipeline using graph and text search. Currently this design has the limitations of a normal vector rag

Github of current source :

https://github.com/inukaeth/ragcosmosdb/tree/main

Assembly dependency checker

Posted on September 9, 2024September 9, 2024 by Inuka G Posted in c#Tagged assembly, c#

Simple utility to check if all assembly dependencies are present.

A common runtime error that occurs during execution is the assembly not found error. Tracking dependencies, to can be annoying. This utility uses assembly metadata, to load assemblies in a given directory to determine if the required assemblies are present. This is still a work in progress, as assemblies in GAC and system utilities are not accounted for.

Additional enhancements coming soon

https://github.com/inukaeth/AssemblyDependanceChecker/tree/main

An Ethereum Storage Decode tool

Posted on March 27, 2020April 14, 2020 by Inuka G Posted in Ethereum

This article is part 2 of a two part series covering storage decoding of Solidty based contracts on Ethereum. This article covers the Ethereum storage decoder tool.This tool allows you to look at the state of any contract stored on any Ethereum based block chain as long as you have the source and address of the contract. For map types the keys must be known in order to view the data. Currently only solidity based contracts are supported. Part1 of this article covers the concepts used by this tool to decode storage.

The source and binaries of this tool are locate at : https://github.com/inukaeth/ethstoragedecode

The Etheirum Storage Decode tool

I used my findings described in part 1 of this article to write a simple decode tool to decode storage values of a smart contract. The tool was written in .net core. ANTRL was used to parse the solidity contract code. I used the ANTLR grammer by Fredrico Bond( Github link). This grammer was used to generate a c# ANTRL wrapper class which was overridden so that the variables could be parsed and the values decoded. The nethereum framework was used to interface with Ethereum.

Currently the tool supports decoding of a single contract. The tool also supports a single solidity source file with multiple contract definitions. In this case the primary contract name must be provided for the decode to work. I have made the tool and source available as a development release given the limited amount of testing that has been performed

Usage:

Note the tool takes in a json file named settings.json as input. Since decoding maps requires knowing the key values, the known keys or dimensions(for multi-dimensional maps) must be provided. Multi-dimensional map key are comma separated

Sample Json File :

Sample output:

The tool source and binaries can be downloaded from:

https://github.com/inukaeth/ethstoragedecode

Other tools

Note that eitherscan currently supports data decoding of contracts that are deployed.

Decoding the memory of an Ethereum Contract

Posted on February 24, 2020March 30, 2020 by Inuka G Posted in Ethereum

This article is part 1 of 2 part series that examines the storage mechanism of the Ethereum Virtual Machine (EVM).

Part 2 will cover a storage decoding tool written using the concepts in this article.

Ethereum Memory

The EVM allows for the execution of smart contract code. The contract state or memory is stored at the contract address. This storage can be thought of as an array like data structure of infinite length located at the address of the contract. The storage mechanism ensures there are no conflicts in storage locations and follows a set of rules. Using these rules we can decode the state of any contract. Decoding the data stored in a map requires knowing the keys that are used. Decoding of contract data is performed using the RPC call eth_getStorageAt .

The slot position

The position of a variable in the storage array of a a smart contract is dictated by the order it appears in the code, and the size of the variable. This position is known as a slot. If a variable is less than 256 bit, the EVM attempts to fit more than one variable in the space and therefore, more than one variable may occupy the space of a single slot in the storage array. A map or array will always occupy a single slot. The location of the elements of arrays and maps follow a set of special hashing rules which this article will go over. These rules are also described in the ethereum documentation.

The table below (table 1) provides a quick summary of the allocation rules that are followed by the EVM. We will look at two contract examples, and decode them using the rules provided in table 1

Simple Example with 256bit variables

First let us look at a simple example with all variables of 256bits(32 bytes length). Doing so allows us to look at the allocation without consideration for variable packing.

Note that when applying keccack hash to numbers , the number must be a 0 padded 64bit value.

All the decoding is performed using the ethereum RPC call eth_getStorageAt indicated as GetStorageAt in the article. Any language wrapper such as nethereum or web3j can be used to call this RPC api.

The following diagram (figure 1) shows how GetStorageAt calls are made to the address of the contract and the position value passed to it. The numbers on the left side in figure 1 are the positions of the variables. For base types(uint, string etc. ) this position can be passed into GetStorageAt to get the variable values. For an array the position will return the length of the array.

Array index are decoded by passing the Keccack hash to GetStorageAt for index 0. Each subsequent index of the array is located at the hash value summed with the position. This can be thought of as accessing the pointer to the array and incrementing its position to find each element , similar to C or C++.

Maps are a little more complex. The position value passed to GetStoragetAt for each key is the keccack hash of the key and the position of the map declaration. For multi-dimensional maps Keccack hash values is recursively called for the keys and the variable position. See the example in figure 1 for clarification.

Next lets look at an example where variable packing takes place. The thing to remember with packing is that:

It only applies to base variable types(uint128, string , int etc) in the order of appearance. The EVM will pack as many variables in a 256bit space in the order the variables are listed in the code.
Each map and array variable will take a up a new slot.
The array variables mapping will follow packing rules. That is if an element is small than 256bits , multiple index of the array will occupy a single slot in the storage array.

These rules are explained in the ethereum documentation as well

Figure 2 shows and provides an explanation of the packing that takes place. When a type is less than 256bits in length the EVM attempts to pack additional variables into the slot. The EVM picks the variables to pack in the order they are listed. Maps and arrays always appear in a new position. However, the packing rules still apply for decoding array indices, and packing rules still apply for structs stored in maps. See Figure 2 for an explanation how the variables are stored in this case.

Inheritance

A note about inheritance. When a contract inherits other contracts then the storage variables of base contracts occupy the first slots of the storage array in the order of inheritance. The storage variables of the subclass will appear afterwards.

Conclusion

As noted we can use the rules descried in this article to decode the memory of an Ethereum smart contract. Part 2 will describe a tool that was written using the rules described in this article