1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# `cite.pub`
`cite.pub` is a short URL service that provides a permanent
cache of the target site to ensure that referenced material
will always be available. The high availability and strong
data persistence gauruntees of this project will make
`cite.pub` a suitable service for inclusion in academic
papers as permanent URLs. Think doi.org but also for all
digital documents.
## API
- `SaveResource(string publicUrl, uint depth = 1, string proposedAlias) -> ResultOr<string cacheUrl>`
- `ResolveResource(string alias) -> ResultOr<string cacheUrl>`
## UI
The web UI will have a few web endpoints:
- `https://cite.pub`
- `/` - GET: splash page with creation form and resolving
form.
- `/<string alias>` - GET: what users use to load their
cached resource, `302 Found` redirect to view page
above. POST: where the form on the front page will POST
it's content for creating a new alias.
## Constraints
### Short URL alias format
Considering short URL standards, valid characters will be:
```txt
23458abcdefghijknrstxyz
```
- `0` is not allowed because it looks like `O` or `o`..
- `1` is not allowed because it looks like `l` or `L`.
- `6` and `9` are not allowed because they are mirrors.
- `l` is not allowed because it looks like `1`.
- `m` is not allowed because it looks like a dense
character.
- `o` is not allowed because it looks like `0`.
- `p` and `q` are not allowed because they are mirrors.
- `u` and `v` are not allowed because they look similar.
- `w` is not allowed because it looks like a dense.
- Capital letters and all other characters are not allowed.
Given 23 allowed characters and alias strings of a max
length of 5, without repeated characters, we have:
$$ nCr(23, 5) = \dfrac{23!}{(23-5)!} = 4037880 $$
Or around four million available unique IDs; that's plenty
for now!
### File formats
Typical file formats supported are listed below and
discussed:
- PDF: `*.pdf`
- HTML: `*.html` - presents a unique challenge: CSS,
JavaScript, images, and hrefs in the source HTML file may
need to be fetched and inlined, or be rewritten to refer
to other cache links provided by `cite.pub`.
## Data Models
The data format will be as:
```cc
// Collection: resources/<uuid>.json
struct Resource {
// Auto-generated.
string uuid;
// Auto-generated or user-selected shortId (globally unique).
// E.g., `fnc38` a random 5 character unique ID.
string alias;
string created_iso8601;
// Reference to the contents/<contents_uuid>.json file
// containing all the large contents of the file.
string contents_uuid;
};
```
```cc
// Collection: contents/<uuid>.json
struct Content {
string uuid;
// Parent reference.
string resource_uuid;
int content_length;
string content_type;
};
```
By default, all files will be publically-available and
cannot be deleted.
The database used is a UUID-based document database so we
need a structure to allow fast lookups of aliases to UUIDs.
An index "table" can be:
```cc
// Filename: asdf.json
// Collection: lookups/<uuid>.json
struct Lookup {
string uuid;
// Takes on `asdf` in this example.
string alias;
// Use O(1) lookup of aliases via filesystem checks.
// The `Resource::uuid` field, i.e. the filename of the actual record.
string resource_uuid;
};
```