The main use case to me is to let my loved ones access my password manager if something happens to me. When the time comes, I will not be there to walk them through it.
With Secret Splitter, you do not need to be a rocket scientist to make it work. The decoding logic is especially straightforward.
This page explains how Secret Splitter works:
First comes a very high level view.
Next comes the specification of what a piece of that puzzle looks like. Follow it to make encoding and decoding cross-compatible across implementations.
I will then deep dive into the specifics of my implementation.
The last part is a mathematical proof of how secure Secret Splitter is.
You do not need to read and understand that last part to use or code Secret Splitter yourself.
If you want to skip directly to using it, here are links to:
If you know Shamir’s Secret Sharing algorithm, the main difference is that Secret Splitter works on fixed-size blocks of the secret. So Secret Splitter can skip the search for a prime number bigger than the secret, making splitting secrets much faster.
Hey there, I’m optional content! Click me to hear what I have to say.
Tu fully understand how Secret-Splitter works, you will need basic knowledge about some mathematical tools.
Boxes like me will appear from time to time to explain those tools to you. Feel free to skip us if you already know them, or are just looking for a high-level overview of Secret-Splitter.
A Polynomial is a curve that can be more or less curvy. Every polynomial has a degree, D, that tells you how curvy it is.
TODO canvas with default image, polynomials of degree 0, 1, 3 + legend. with JS, a button to increase/decrease the degree, starting at 0, + dynamic redraw of the polynomial
See how the shape of the polynomial changes when you increase or decrease its order.
As you can see, a polynomial of degree 0 is just a flat line. Increasing the degree to 1 adds a slope. Increasing it to 2 adds a turn. Every further increase will add a new turn.
Another way to look at it is that the degree tells you how many times a given polynomial can cross a flat line.
TODO smame canvas + flat horizontal line that can be dragged up and down to see them cross. If possible, highlight intersections.
Move the horizontal line up and down and see how many times you can get it to cross the polynomial.
TODO special sentense for 0?
What makes polynomials useful for secret splitter is that:
👉There is only one polynomial of degree D going through a given list of D+1 points.
TODO canvas where reader chooses number of points + can move the points around, and we draw the polynomial
Pick a number of points and move them up and down to see how the corresponding polynomial changes. Can you always make it look like a straight line?
👉 You can always make a polynomial of a higher degree look like one of a smaller degree.
TODO canvas where you choose a degree + “target” degree that is lower. It draws a polynomial of degree target, and the user can move the chosen+1 points and see the reulting polynomial.
Move the points of your polynomial up and down. Can you make it match the target polynomial?
So to share a polynomial of degree D with someone else, you just need to give them examples of points on that polynomial.
TODO user can choose the degree of a polynomial that is automatically generated & drawn. By clicking on the polynomial, they can select points on it. The canvas then draws the polynomial that goes through those selected points.
Click on the blue polynomial above to select the points to send. The yellow line is the polynomial rebuilt using those points.
How many points to you need to give them so they can retrieve the polynomial? What happens if you give them more? What happens if you give them less points?
And that’s it, you now know everything you need about polynomials!
Decide the total amount of puzzle pieces you want to split your secret into: T (Total).
Decide how many pieces will be required to decode the secret: D (Decode).
Get the binary representation of your secret.
Generate a random mask for your secret:
Each byte of your secret is represented using Two’s complement as a value between -128 and 127.
For each byte of the secret, generate a new random byte between -128 and 127.
Those new random bytes are the mask.
This mask is the symmetric key used to encode and decode your secret.
Encode your secret with the mask.
For each byte of the mask:
Generate a random polynomial of degree D-1. It needs to be equal to the value of that byte when taken at 0.
Sample T points 🔴🟡…🔵 randomly on this polynomial.
I will use shapes and colors to differentiate points. Points with the same shape belong to the same polynomial.
You now have T random points for each byte of the mask:
Mask byte 1: 🔴🟡…🔵
Mask byte 2: 🟥🟨…🟦
…
Mask byte X: ❤💛…💙
Points are initially grouped by shape: each byte has its own polynomial.
Group the first point of each byte together, the second point of each byte together, etc… for all T points. This gives you your T puzzle pieces for the mask:
Piece 1: 🔴🟥…❤
Piece 2: 🟡🟨…💛
…
Piece T: 🔵🟦…💙
Each piece has one point of each byte: one piece has all the points of a given color.
Send the encoded secret along with one of the puzzle pieces to each of the T participants.
Notice that some colors are missing: you don’t necessarily need all the pieces.
Group the first point of each piece together, the second ones together, etc… This gives you N points for each byte of the mask:
Mask byte 1:🔴…🟡
Mask byte 2:🟥…🟨
…
Mask byte X:❤…💛
Points are grouped by shape again: for each byte of the mask, you have some of the points of its polynomial.
For each of those bytes, find the polynomial of degree N-1 going through all the corresponding points 🔴…🟡.
Look at the values taken at 0 by each of those polynomials, and treat them as bytes of a mask.
Use that mask to decode the encoded secret attached to the pieces.
If N is smaller than D, all you are looking at is random data. You will need more pieces to decode the mask.
If N is at least as big as D, this is the original secret!
This works because you need exactly D points to uniquely identify a polynomial of degree D-1.
If N is at least as big as D, you have D points or more for each byte of the mask. This is enough to identify the original polynomials and retrieve their value at 0: the original mask.
But if N is smaller than D, you don’t have enough information to retrieve the mask and decode the secret.
An attacker wants to know your secret. They know that you have used Secret Splitter to make pieces out of it.
The question is: does it make their job easier? What if they got hold of some of the pieces?
I will measure how much this knowledge makes brute-force attacks easier. This is agnostic of the kind of secrets, and quantifies the usability of that knowledge. The same reasoning applies to more sophisticated, use-case specific attacks.
The short answer is:
👉 Naive use of Secret Splitter reveals the length of your secret and whether it is some text or a file.
As we will see, you can easily avoid that by adding context to the secret you store (jump to section).
👉 The main trade-off is that you are introducing social vulnerabilities.
If enough people betray your trust, they will be able to decode the secret without you knowing about it.
Attackers can try to steal enough pieces from the group instead of stealing the secret from you.
Sort the person of the group by increasing ability to store their key safely.
Your secret becomes as week as the D’th person on that list.
If you trust that person to store their piece as securely as you would, then it’s actually harder to steal the D pieces than it is to steal the secret directly from you: an attacker would have to steal that piece number D, plus some extra work to get the previous D-1 pieces.
👉 The other trade-off is space.
Each participant will have to store a piece about 10 times as big as the original secret (before compression).
For large secrets, I recommend encoding them with another cryptographic algorithm first. Then use Secret Splitter on the decoding key. Don’t forget to send a copy of the encoded secret with each piece.
The attacker goes through every possible values of your secret.
For each one, they try it out and see if it is the correct one.
For a password manager master key for example, this means trying to unlock the database.
If it works, they found your secret.
If not, they move on to the next possible value.
An attacker will need to try on average half of all the possible values before finding your secret. Intuitively there is a 50% chance to have already seen your secret once they have been through half of them.
The benefits of that kind of attacks are:
The attacker only needs to have a way to check if a given guess is actually your secret.
Iterate all possible sequences of bytes and you will find the secret eventually.
The drawbacks are:
The number of possibilities is exponential.
For example, you needd 8 bytes (256 bits) to store the word “password”.
There are 18 billion billion unique ways to fill those 8 bytes. Yes, billion twice: 18, with 18 zeroes behind.
An average smartphone could brute-force a password in that list almost instantly.
There are 26 lower-case Latin characters. That makes 26^8 combinations of 8 characters, so about 208 billion possibilities.
An average smartphone could brute-force any 8 character password in a few minutes.
Adding upper-case, digits, and special characters gives around 70 possible values per character. So around 600 thousand billion possibilities for 8 characters.
An average smartphone would need about a month or two to test them all.
And here I only consider 8 bytes secrets. Doubling the size of your secret squares the number of possibilities. There are 160 million billion ways to take 4 words out of the 20,000 common English words.
This is why longer secrets are usually better than short but complex ones. My master key is 7 words generated with the original diceware method. That’s one of about 2 billion billion billion possibilities. Yet, easy to remember.
The main drawback is that any wrong assumption dooms the attacker. Your secret is just not among the possibilities they are considering. But they will only find that out after doing all the work.
Both encoded secret and puzzle piece reveal the length of the secret. Without it, attackers have to first go through all values of size 1, then all of size 2, etc… By starting at the right size, they skip a lot of work.
On raw binary data, this saves about 66% of the effective average work: Without the size information, attackers have to first waste the 2^n-1 possibilities of sizes lower than n. They would then on average go through half of the 2^n possibilities of size n. With it, it’s just on average half of the 2^n possibilities of size n.
It diminishes very quickly for non-binary data however. For a password in lower-case Latin characters, an attacker would only save about 7.4% of the work. Add upper-case and its down to around 3.8%.
Where do those numbers come from?
For a secret of length n with d possible values for each character, an attacker needs on average \frac{b^n}{2} attempts to brute-force the secret knowing its lenght, and \frac{b^n}{2} + \sum_{i=1}^{n-1}{b^i} attempts without knowing it.
The amount of work saved is 1-\frac{amount~of~work~knowing~the~length}{amount~of~work~without~knowing~the~length}.
Using \sum_{i=1}^{n-1}{b^i}=\frac{b^n-b}{b-1}, we get:
So in practice, knowing the length of your secret is not very valuable.
If it still troubles you, there is a very easy way to make that knowledge completely useless:
👉 Add context to what you encode in Secret Splitter #
The idea is to add useful information for whoever will need to use your secret in what you split using Secret Splitter.
For example if you are splitting the password MySuperPassword, use Secret Splitter on:
Here is the master key to my keepass database: "MySuperPassword"
You will make the size of encoded secret larger than the actual secret. So attackers deciding to use that information are doomed: they believe the secret to be larger than it is!
As a bonus, by including instructions with the secret, you are making sure that they will never be lost.
You can apply the same trick to binary data by encoding them in base64 first:
Here is the base64 encoded gzip archive storing my secret: H4sIAAAAAAAAAw3K0Q2AIAwFwFXeEMYZdIwKL9BEC2kJRqfX77uNTmhAMOkPTvFCBJNzYFQZ2FFlEqld3RnBjFtHRXm1Qyz/iQZaavknNRwSXJcPmrlj8lcAAAA=
encoding: utf-8 indicates that the secret consists of readable characters.
But remember that to try and guess your secret, attackers need a way to test if their guesses are correct. So they would already get that information from the context.
How Secret Splitter works: the nitty-gritty details #
Secret Splitter is about finding points on polynomial curves:
To encode a byte of the secret in T pieces, you need to find the value taken at T - D points from the value it takes at 0, your secret, and D-1 random points.
To decode a byte of the secret with N pieces, you need to find the value taken at 0 by the polynomial from the value it takes at the N points you know.
For that, Secret Splitter relies on Lagrange Interpolation. I will start by showing you how it works when decoding a puzzle, as it is simpler. We will then look at creating the Secret Splitter puzzle.
After step 2, you have a list of at least N points, (X_1, Y_1), (X_2, Y_2), ..., (X_N, Y_N) for each byte of the mask. You know that those points are values taken by polynomials of degree D-1.
You need to find the value taken by those polynomials at 0.
Using Lagrange Interpolation, the value at 0 is the sum of each value Y_i multiplied by a corresponding interpolation factor L_i:
Repeat for each byte of the mask and XOR the result with the encoded secret provided in each puzzle piece. If N is bigger or equal to D, you have decoded the secret. If not, you have random data.
The 0 in X_j - 0 controls which point you are getting. By changing it to any other value X, the formula above gives you Value(X), the value taken by the polynomial at that point.
To create a puzzle requiring D pieces to solve, you need to generate random polynomials of degree D-1. They should each take at 0 the value of a byte of the mask given by step 4.
You then sample T points on each of those polynomials using Lagrange Interpolation, exactly as when decoding a puzzle. You just change 0 in the formula above by the X-coordinates of the points you want.
So to create a puzzle from your secret, Secret Splitter will, for each byte of the mask:
Pick the point (0, mask\_byte\_value) as the first point of the polynom.
Generate D-1 random values Y_1, Y_2, ..., Y_{D-1} giving us the values taken by the polynomial at points (1, Y_1), (2, Y_2), ..., (D-1, Y_{D-1}).
Use Lagrange interpolation to sample the points (D, Y_D), (D+1, Y_{D+1}), ..., (N, Y_T).
This gives you a total of T pieces for each byte of the sub-puzzle. You can then proceed with step 6.
By design:
Those polynomials are of degree D-1
They take the values of the bytes of the mask at 0.
Getting at least D points for each of them gives you back the mask.
But decoding with less than D pieces gives you a random value.
The tricky part is to generate the random polynomials correctly.
Do it right and attackers won’t learn anything about your secret until they hold at least D pieces of the puzzle. But do it wrong and just a couple of pieces can give enough information to make brute-force attacks on the secret trivial.
My implementation does it right. But to show you how, and prove you why, we need two more tools: modular arythmetics and uniform probability distributions.
Arythmetics is the part of Maths about adding, substracting, multiplying and dividing numbers together.
If we put all numbers on a rod, adding and substracting means going right or left.
Adding 3 is moving 3 ticks to the right. Substracting 3 moving 3 ticks to the left.
TODO image with the line of real numbers, some points and example of moving them by additions/substractions
Multiplying and dividing is inflating or deflating the rod.
Multiplying by 3 is inflating the rod until every tick fills the space previously held by 3 ticks. Dividing by 3 deflating the rod until it takes 3 ticks to fill the space previously held by one tick.
TODO image with the line of real numbers, some points and example of scaling them by multiplications/divisions
👉 Modular arythmetics is regular arythmetics with teleportation portals on top.
First, you put a teleportation portal on 0. Then you pick the modulus number, say 4, and put a teleportation portal on it.
Now everything that goes into one of the portals continues its way, but getting out from the other portal.
If you were to stand on that number rod, you would see something like this:
image from the portal game
Which translates to this when looked at from the side:
infinite rod with repeating portals, tick numbers that reset every time
For additions, substractions and multiplications you can keep on doing it just like regular arythmetics:
TODO image with 2+3%4
It’s just that to read the result, you start counting the number of ticks since the last portal.
A more analytical way to do modular arythmetics is to ignore the modulus to start with and do your computations as usual. Once you get your result, you check if it’s bigger than your modulus. If yes, you substract the modulus over and over until your result is smaller than it. If your result is negative, you do the opposite and add your modulus over and over until the result gets positive.
For example 3 \times 3 = 9, which is bigger than 4. So 3 \times 3 modulus 4 = 9 modulus 4 = 5 modulus 4 = 1
TODO two pictures of the numer rod, one without portals on which we do 3x3, and a second one were we add the portals and read the result
Dividing is a bit trickier.
In regular arythmetics, for any number X you can find one number Y such that Y \times X = 1. This number is called the inverse. Dividing by X is the same as multiplying by its inverse.
But in modular arythmetics, there can be several different numbers that qualify as the inverse.
If we look for inverses of 3 modulus 4, the following numbers work.
\frac{1}{3}: the regular arythmetic inverse.
3: 3 is it’s own inverse because 3 \times 3 modulus 4 = 1 as seen above.
So say you want to divide 2 by 3. You will get different results depending on which number you pick as the inverse of 3. This is not very practical, we need a way to decide which one is the real inverse of 3 modulus 4.
The inverse of an integer in modular arythmetics is, when it exists, the integer that gives 1 when multiplied by your number. If it exists, the good news is it’s unique. But the bad news is it doesn’t always exist.
So the inverse of 3 modulus 4 is 3. But 2 does not have an inverse modulus 4: you cannot divide by 2 modulus 4.
A last thing about modular division:
There are some integers that stand out in arythmetics because they only have two integer divisors, one and themselves. Those numbers are called prime numbers.
👉 If your modulus is a prime number, then every integer except 0 has a modular inverse. This will be very useful for Secret Splitter.