Differenze tra le versioni di "Server/horror/Technical documentation"
m (→Disaster recovery: fix) |
(add random info) |
||
(14 versioni intermedie di uno stesso utente non sono mostrate) | |||
Riga 1: | Riga 1: | ||
{{Server|horror}} | {{Server|horror}} | ||
This page is the public technical documentation for the server {{Server link|horror}}, dedicated to ''off-site'' backups, useful for a [[#Disaster recovery]]. | This page is the public technical documentation for the server {{Server link|horror}}, dedicated to ''off-site'' backups, useful for a [[#Disaster recovery]]. | ||
+ | |||
+ | == In short == | ||
+ | |||
+ | In short the server {{Server link|horror}} can ''receive'' an additional off-site backup from other servers. These off-site copies are kept and maintained for multiple days. | ||
+ | |||
+ | Every single server ''pushes'' on server {{Server link|horror}} what should be saved off-site. So, the server {{Server link|horror}} does ''not'' decide what should be saved. | ||
+ | |||
+ | Example of night transfer activity: | ||
+ | |||
+ | <pre> | ||
+ | ┌─────────┐ | ||
+ | │intreccio│ (push) | ||
+ | └────┬────┘ | ||
+ | ↓ | ||
+ | ┌─────────┐ | ||
+ | │ horror │ (receiver) | ||
+ | └─────────┘ | ||
+ | ↑ | ||
+ | ┌────┴────┐ | ||
+ | │lessema │ (push) | ||
+ | └─────────┘ | ||
+ | </pre> | ||
== Authorization == | == Authorization == | ||
Riga 18: | Riga 40: | ||
* https://wiki.wikimedia.it/wiki/Infrastruttura | * https://wiki.wikimedia.it/wiki/Infrastruttura | ||
+ | |||
+ | == Authorized Users == | ||
+ | |||
+ | Authorized server operators in {{Server link|horror}}: | ||
+ | |||
+ | * [[:Categoria:Accessi/Server horror/sistemisti]] | ||
+ | |||
+ | List of SSH usernames and users: | ||
+ | |||
+ | * <code>valerio-bozzolan</code> - [[User:Valerio Bozzolan]] | ||
+ | * <s><code>anylink-...</code> - [[m:User:DavideCuteri-WMIT]]</s> | ||
+ | |||
+ | In case the above persons are gone, contact a superadmin of the service provider, and contact another trusted server administrator, to recover access (in case, asking for help from support): | ||
+ | |||
+ | * [[:Categoria:Accessi/Fornitore ctb/superadmin]] | ||
== Server login == | == Server login == | ||
Riga 36: | Riga 73: | ||
If it doesn't work, stop <u>immediately</u> and repeat [[#Authorization]]. | If it doesn't work, stop <u>immediately</u> and repeat [[#Authorization]]. | ||
− | Do not try random attempts or you can be blocked, notified, fired or even sued. | + | Do not try random attempts or you can be blocked, notified, fired or even sued. Your life can be terminated by an AI. |
+ | |||
+ | == Change user == | ||
+ | |||
+ | If you have a personal [[#Server login]] with enough privileges and you need to change user, use sudo: | ||
+ | |||
+ | sudo --login --user=ANOTHER_USER | ||
+ | |||
+ | Then you can do anything like that user, for example: | ||
+ | |||
+ | crontab -l | ||
+ | |||
+ | This is useful if you want to test a specific ''pull'' backup user. | ||
+ | |||
+ | == Root user == | ||
+ | |||
+ | The root user should not be used in normal conditions. | ||
+ | |||
+ | During an emergency, you can use sudo to add your SSH keys inside the usual position: | ||
+ | |||
+ | /root/.ssh/authorized_keys | ||
== Filesystem overview == | == Filesystem overview == | ||
Riga 43: | Riga 100: | ||
/var/backups/wmi | /var/backups/wmi | ||
+ | /var/backups/wmi/intreccio.wikimedia.it | ||
+ | /var/backups/wmi/lessema.wikimedia.it | ||
+ | /var/backups/wmi/... | ||
Older copies can be obtained adding a numeric suffix. For example the 2-days-old backups are here: | Older copies can be obtained adding a numeric suffix. For example the 2-days-old backups are here: | ||
/var/backups/wmi.2 | /var/backups/wmi.2 | ||
+ | /var/backups/wmi.2/intreccio.wikimedia.it | ||
+ | /var/backups/wmi.2/lessema.wikimedia.it | ||
+ | /var/backups/wmi.2/... | ||
Note that all sub-directories can be accessed only if you are its dedicated user. | Note that all sub-directories can be accessed only if you are its dedicated user. | ||
− | For example | + | For example the user <code>intreccio</code> MUST be the only one able to write in this position: |
− | /var/backups/wmi/ | + | /var/backups/wmi/intreccio.wikimedia.it |
− | |||
− | |||
− | |||
− | + | For example the user <code>intreccio</code> MUST NOT be able to read/write old copies: | |
− | + | /var/backups/wmi.1/intreccio.wikimedia.it | |
+ | /var/backups/wmi.2/intreccio.wikimedia.it | ||
+ | /var/backups/wmi.3/intreccio.wikimedia.it | ||
− | + | == Filesystem policy == | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | == Filesystem | ||
The filesystem rule is the standard one in Unix-like systems: give as <u>few</u> privileges as possible. | The filesystem rule is the standard one in Unix-like systems: give as <u>few</u> privileges as possible. | ||
Riga 110: | Riga 166: | ||
<pre> | <pre> | ||
USERNAME=foo | USERNAME=foo | ||
− | PROJECT= | + | PROJECT=fooproject |
sudo adduser --disabled-password $USERNAME | sudo adduser --disabled-password $USERNAME | ||
Riga 120: | Riga 176: | ||
The final purpose is to execute this command daily <u>from you server ''foo''</u> to <u>push</u> your backups on server horror: | The final purpose is to execute this command daily <u>from you server ''foo''</u> to <u>push</u> your backups on server horror: | ||
− | rsync /my/ | + | rsync /my/source/path foo@horror.wikimedia.it:/var/backups/wmi/fooproject |
− | You can also execute | + | You can also execute this command daily <u>from server ''horror''</u> to <u>pull</u> data from server ''foo'': |
− | + | rsync mysource@myserver:/my/source/path /var/backups/wmi/fooproject | |
+ | |||
+ | ; Schedule time policy | ||
+ | |||
+ | Your backup logic can write in the backup location in this period: | ||
+ | |||
+ | * 12:00-23:59 Europe/Rome | ||
+ | * 00:00-04:59 Europe/Rome | ||
+ | |||
+ | You <u>must not</u> write there in this period instead, otherwise you may have collisions with the rotation logic: | ||
+ | |||
+ | * 05:00-12:00 | ||
+ | |||
+ | ; Available backup tools | ||
* rsync | * rsync | ||
Riga 131: | Riga 200: | ||
* https://gitpull.it/source/micro-backup-script/ (just a stupid script that encapsulates those above) | * https://gitpull.it/source/micro-backup-script/ (just a stupid script that encapsulates those above) | ||
* ... | * ... | ||
+ | |||
+ | ; Success checklist | ||
+ | |||
+ | # your data is saved (by you, or by your new crontab rule) at midnight in <code>/var/backups/wmi/''fooproject''</code> | ||
+ | # your data is automatically rotated in <code>/var/backups/wmi.1/''fooproject''</code> in the next day | ||
== Disaster recovery == | == Disaster recovery == | ||
Riga 136: | Riga 210: | ||
; You need | ; You need | ||
+ | # understanding whether the Unix user pushing backups has been compromised - in that case - DISABLE IT IMMEDIATELY - DISABLE ALL SSH KEYS of that user | ||
# a good understanding of what data is to be recovered and from what date | # a good understanding of what data is to be recovered and from what date | ||
# check if the provider has native backup/snapshots (if yes, try to use them - they may be more simple to be recovered) | # check if the provider has native backup/snapshots (if yes, try to use them - they may be more simple to be recovered) | ||
Riga 141: | Riga 216: | ||
# [[#Server login]] | # [[#Server login]] | ||
− | ; Instructions | + | ; Recovery Instructions |
# please create a public Task in [[phabricator:tag/wmit-infrastructure/]] to describe the incident shortly, and notify [[Infrastruttura]] | # please create a public Task in [[phabricator:tag/wmit-infrastructure/]] to describe the incident shortly, and notify [[Infrastruttura]] | ||
− | # | + | # do a [[#Server login]] |
− | #: Example: | + | # be sure to be able to become [[#Root user]] |
− | # | + | # explore the filesystem to find the most relevant backup |
− | + | #: Example for latest copy: | |
− | #: Example: | + | #: <code>ls -l /var/backups/wmi</code> |
− | #: <code> | + | #: Example for 13 days ago: |
− | + | #: <code>ls -l /var/backups/wmi.13</code> | |
− | + | # just use standard utilities to download the needed data | |
− | |||
− | # | ||
#: Example: | #: Example: | ||
− | #: <code> | + | #: <code>rsync root@horror.wikimedia.it:/var/backups/wmi.13/intreccio.wikimedia.it/daily/databases/matomo.sql.gz ./my-destination/</code> |
[[Categoria:Documentazione tecnica|horror]] | [[Categoria:Documentazione tecnica|horror]] |
Versione attuale delle 17:19, 16 mag 2024
This page is the public technical documentation for the server ⚙️ horror
, dedicated to off-site backups, useful for a #Disaster recovery.
In short
In short the server ⚙️ horror
can receive an additional off-site backup from other servers. These off-site copies are kept and maintained for multiple days.
Every single server pushes on server ⚙️ horror
what should be saved off-site. So, the server ⚙️ horror
does not decide what should be saved.
Example of night transfer activity:
┌─────────┐ │intreccio│ (push) └────┬────┘ ↓ ┌─────────┐ │ horror │ (receiver) └─────────┘ ↑ ┌────┴────┐ │lessema │ (push) └─────────┘
Authorization
Server administrators must be authorized before being able to do a #Server login in the ⚙️ horror
backup server. To be authorized:
- You need
- a good reason
- for example #Add a project under the backup umbrella
- for example #Disaster recovery)
- Unix-like sysadmin experience
- Instructions
Request access policy:
Authorized Users
Authorized server operators in ⚙️ horror
:
List of SSH usernames and users:
valerio-bozzolan
- User:Valerio Bozzolananylink-...
- m:User:DavideCuteri-WMIT
In case the above persons are gone, contact a superadmin of the service provider, and contact another trusted server administrator, to recover access (in case, asking for help from support):
Server login
Access to the backup server is exclusively via SSH login. There are no other forms of access, since SSH is the most secure method possible. To do it:
- You need
- #Authorization
- SSH experience
- Instructions
Just login via SSH using the username we assigned to you in your #Authorization process:
ssh name-surname@horror.wikimedia.it
If it doesn't work, stop immediately and repeat #Authorization.
Do not try random attempts or you can be blocked, notified, fired or even sued. Your life can be terminated by an AI.
Change user
If you have a personal #Server login with enough privileges and you need to change user, use sudo:
sudo --login --user=ANOTHER_USER
Then you can do anything like that user, for example:
crontab -l
This is useful if you want to test a specific pull backup user.
Root user
The root user should not be used in normal conditions.
During an emergency, you can use sudo to add your SSH keys inside the usual position:
/root/.ssh/authorized_keys
Filesystem overview
You can explore the filesystem only after #Server login. All recent backups are here:
/var/backups/wmi /var/backups/wmi/intreccio.wikimedia.it /var/backups/wmi/lessema.wikimedia.it /var/backups/wmi/...
Older copies can be obtained adding a numeric suffix. For example the 2-days-old backups are here:
/var/backups/wmi.2 /var/backups/wmi.2/intreccio.wikimedia.it /var/backups/wmi.2/lessema.wikimedia.it /var/backups/wmi.2/...
Note that all sub-directories can be accessed only if you are its dedicated user.
For example the user intreccio
MUST be the only one able to write in this position:
/var/backups/wmi/intreccio.wikimedia.it
For example the user intreccio
MUST NOT be able to read/write old copies:
/var/backups/wmi.1/intreccio.wikimedia.it /var/backups/wmi.2/intreccio.wikimedia.it /var/backups/wmi.3/intreccio.wikimedia.it
Filesystem policy
The filesystem rule is the standard one in Unix-like systems: give as few privileges as possible.
Here is a summary of the main filesystem pathnames:
Path | owner:group | Permissions | Description |
---|---|---|---|
/var/backups/wmi*/ | root:root | 755 | Everyone should be allowed to list its sub-directories to list the available latest backups.
Note: You may be allowed to list sub-directories but you must be not allowed to access them as default. |
/var/backups/wmi*/project | project:project | 750 | The user project must be the only one allowed to access in its sub-directory. |
Note: the location /var/backups/wmi
is automatically rotated in /var/backups/wmi.1
etc. and the oldest is automatically deleted. Permissions are just kept.
Add a project under the backup umbrella
- You need
- a good understanding about what data need to be saved
- a good understanding about how to transfer that data (e.g. rsync + SSH)
- #Server login
- Instructions
In short you just need to create a directory on server ⚙️ horror
and a dedicated user able to read/write in that directory. Then, you can push backups on that directory.
Some pseudo-instructions to be executed from server ⚙️ horror
to create a new project foo to be added under its backup umbrella:
USERNAME=foo PROJECT=fooproject sudo adduser --disabled-password $USERNAME sudo mkdir --parents /var/backups/wmi/"$PROJECT" sudo chown $USERNAME:$USERNAME /var/backups/wmi/"$PROJECT"
The final purpose is to execute this command daily from you server foo to push your backups on server horror:
rsync /my/source/path foo@horror.wikimedia.it:/var/backups/wmi/fooproject
You can also execute this command daily from server horror to pull data from server foo:
rsync mysource@myserver:/my/source/path /var/backups/wmi/fooproject
- Schedule time policy
Your backup logic can write in the backup location in this period:
- 12:00-23:59 Europe/Rome
- 00:00-04:59 Europe/Rome
You must not write there in this period instead, otherwise you may have collisions with the rotation logic:
- 05:00-12:00
- Available backup tools
- rsync
- rclone
- mysqldump
- https://gitpull.it/source/micro-backup-script/ (just a stupid script that encapsulates those above)
- ...
- Success checklist
- your data is saved (by you, or by your new crontab rule) at midnight in
/var/backups/wmi/fooproject
- your data is automatically rotated in
/var/backups/wmi.1/fooproject
in the next day
Disaster recovery
- You need
- understanding whether the Unix user pushing backups has been compromised - in that case - DISABLE IT IMMEDIATELY - DISABLE ALL SSH KEYS of that user
- a good understanding of what data is to be recovered and from what date
- check if the provider has native backup/snapshots (if yes, try to use them - they may be more simple to be recovered)
- check if there are on-site backups (if yes, try to use them - they may be more up to date)
- #Server login
- Recovery Instructions
- please create a public Task in phabricator:tag/wmit-infrastructure/ to describe the incident shortly, and notify Infrastruttura
- do a #Server login
- be sure to be able to become #Root user
- explore the filesystem to find the most relevant backup
- Example for latest copy:
ls -l /var/backups/wmi
- Example for 13 days ago:
ls -l /var/backups/wmi.13
- just use standard utilities to download the needed data
- Example:
rsync root@horror.wikimedia.it:/var/backups/wmi.13/intreccio.wikimedia.it/daily/databases/matomo.sql.gz ./my-destination/