Step by step setup of a Raspberry Pi 3 with Fedora linux-OS and Tezos


#81

@glebowski, that’s a bold move, hopefully it should not take too much time, if you need any help you’ll find it here…

The latest “lwt” package (3.2.0) now should compile with Tezos


another reason why the Tezos development team is great!


#82

@maxtez-raspbaker Thanks!


#83

I execute the command
./tezos-client network stat

but the network stat list does not show my IP, although the tezos-node reports:

Jan 13 09:51:36 - p2p.maintenance: Too few connections (20)
Jan 13 09:51:44 - node.validator.net: update current head BLGrd11fzMPydz 00::000000000001e421 2017-12-09T00:47:07Z(same branch)
etc 
etc

Is this normal?

and then, after a while

Jan 13 10:04:53 - p2p.maintenance: Too few connections (20)
Jan 13 10:05:00 - node.validator.peer: Unexpected error in the validation worker for peer idqxHU5wU9N1Mz:
Jan 13 10:05:00 - node.validator.peer:    Error:
Jan 13 10:05:00 - node.validator.peer:      Fetch of operations BKqavW6cBfTdFuRAu3dJHgkFsRBonTzQSVmYpXfL6hbMWHS1V2G:0 timed out
Jan 13 10:05:00 - node.validator.peer:
Jan 13 10:07:15 - p2p.maintenance: Too few connections (19)
Jan 13 10:09:03 - p2p.maintenance: Too few connections (19)
Jan 13 10:10:38 - p2p.maintenance: Too few connections (19)
Jan 13 10:12:33 - p2p.maintenance: Too few connections (19)
Jan 13 10:14:15 - p2p.maintenance: Too few connections (19)
Jan 13 10:16:03 - p2p.maintenance: Too few connections (19)
Jan 13 10:18:11 - p2p.maintenance: Too few connections (19)
Jan 13 10:21:20 - p2p.maintenance: Too few connections (19)
Jan 13 10:23:16 - p2p.maintenance: Too few connections (19)
Jan 13 10:25:18 - p2p.maintenance: Too few connections (19)
Jan 13 10:27:14 - p2p.maintenance: Too few connections (19)
Jan 13 10:30:29 - p2p.maintenance: Too few connections (19)
Jan 13 10:32:18 - p2p.maintenance: Too few connections (19)
Jan 13 10:34:29 - p2p.maintenance: Too few connections (19)
Jan 13 10:36:38 - p2p.maintenance: Too few connections (19)
Jan 13 10:38:47 - p2p.maintenance: Too few connections (19)

Whats wrong? What is this unexpected error?
Does everything runs as expected in my node?


#84

@demo, I think it can happen that your IP doesn’t show up, try a bit later… you can also browse the network of these two online explorers:
https://tezos.id/tezos-peers.php
http://www.ostez.com/

about the error, yes I have seen it before, not sure what it means. It may be that the peer disconnected while waiting.
Since you have been playing for few days now, I am curious to know how long did you run the tezos-node without interruption? did you stop it or did it crush?
While running the node (for how long?) what is the output of this command: ss -s and of this: free -m


#85

Yes I think my node stucked. But also I dont run it 24/7 without interuption, I have to shut it down some hours.
Here is the result of the commands you asked me, from my node that is currently running.
Everything goes well until now.

ss -s

Total: 359 (kernel 0)
TCP:   197 (estab 174, closed 0, orphaned 0, synrecv 0, timewait 0/0), ports 0

Transport Total     IP        IPv6
*         0         -         -
RAW       1         0         1
UDP       6         3         3
TCP       197       4         193
INET      204       7         197
FRAG      0         0         0

free -m

  total        used        free      shared  buff/cache   available
Mem:            974         302          10           0         661         618
Swap:           243           1         242

By the way I am not fully sychronized to the network yet, as shown below :

Jan 13 13:54:28 - node.validator.net: update current head BLx1DoKKhYbjx9 00::000000000002768a **2017-12-17**T04:08:30Z(same branch)

This is probably because my dsl link is slow and because I shut down my node from time to time.
I will keep it up all this weeked, to see if it will finally sync to the current date.


#86

@demo, ok I see, your output is probably after you just started or stopped the node. Some numbers in the memory output are also probably not right, tot memory doesn’t add up, anyway it is not important at the moment.
do you usually stop the node after few hrs because you decide so or because it breaks down?
If it crushes it is probably because you exceeded the number of open files. While running the node you can see the number of open files with this:
lsof -u <user> |wc -l

Your soft/hard limits are something like 1000/~4000 files:
ulimti -Sn
ulimt -Hn

If you want to increase these numbers, in /etc/security as a root open the file “limits.conf” and add the following two lines at the end of the file:
<user> hard noline xxxx
<user> soft noline yyyy

note that xxxx > yyyy. Log off and log in again to make the changes effective.
Next, if the node is up long enough, the RPI3 will probably run out of memory, keep an eye on the mem output. Let me know how it goes…
I am trying to understand if some problems that I have encountered are widespread or it is just me messing around with things…


#87

@maxtez-raspbaker here you are:

[tz~]$ ss -s
Total: 531 (kernel 0)
TCP:   373 (estab 339, closed 7, orphaned 0, synrecv 0, timewait 0/0), ports 0

Transport Total     IP        IPv6
*         0         -         -
RAW       1         0         1
UDP       6         3         3
TCP       366       3         363
INET      373       6         367
FRAG      0         0         0

[tz ~]$ free -m
              total        used        free      shared  buff/cache   available
Mem:            974         322          17           0         634         598
Swap:           243          11         232
[tz ~]$ lsof -u tz|wc -l
1034
[tz ~]$ ulimit -Sn
1024
[tz ~]$ ulimit -Hn
4096

Do you think I should increase the soft limit?


#88

@demo, ok, lsof gives the tot number of files you have opened, while the soft limit refers to a single application. You can use lsof before starting the node to get the background # of opened files then after starting the node see how that number will increase. If the current number you read minus background # gets close to 1024 then you are in trouble…
what I see on my RPI3 is that the number of files opened by the tezos-node keeps growing with time, it never reaches a steady plateau level, I am trying to figure out how to keep it under control. But maybe for you is going to be different, I hope so, it would mean that I screwed up somewhere but there is for sure a fix. Otherwise we have to keep searching for a solution, worst case scenario the node has to be restarted from time to time using maybe a crontab script.


#89

Thanks for the info. I’ve had to restart about 20 times until now and I’m almost fully synced:

Jan 13 23:03:39 - node.validator.net: update current head BLQxN2LwFooBNa 00::000000000003cad3 2018-01-10T16:05:10Z(same branch)

After full sync I’ll make an image of the microSD (backup) and start playing around with Tezos baking + smart contracts.

@demo: you’re using SSH as well. In order not to lose your session, you can use ‘tmux’. Very easy:

  1. tmux
  2. run the node within tmux
  3. <ctrl + b> + d to exit tmux but leave the session open: you can now close the ssh session if you like
  4. tmux attach to reopen tmux with the session

FYI,

[mootjes@localhost ~]$ ss -s
Total: 197 (kernel 0)
TCP:   38 (estab 28, closed 2, orphaned 0, synrecv 0, timewait 0/0), ports 0

Transport Total     IP        IPv6
*	       0         -         -        
RAW	       1         0         1        
UDP	       7         4         3        
TCP	      36         3         33       
INET	  44         7         37       
FRAG	   0         0         0        

[mootjes@localhost ~]$ free -m
              total        used        free      shared  buff/cache   available
Mem:            974         174          11           0         788         745
Swap:           243           0         243
[mootjes@localhost ~]$ lsof -u mootjes|wc -l
1013
[mootjes@localhost ~]$ ulimit -Sn
1024
[mootjes@localhost ~]$ ulimit -Hn
4096

#90

crontab is of course a better approach if you want to kill a tezos-node that stuck.

But I use the below script. which after executing it I put it in the background (CTRL+Z and then the command bg)

while true
do
initopenfile=`lsof -u tz|wc -l`
echo Initial open files $initopenfile
./tezos-node run --rpc-addr localhost &> /home/tz/tezos/mylogs&
sleep 20
mypid=`ps -x|grep tezos|grep -v grep|cut -f2 -d" "`
echo my process $mypid
sleep 20
openfile=`lsof -u tz|wc -l`
echo open files after tezos started $openfile
substr=`expr $openfile - $initopenfile`
echo diff $substr
while [ $substr -lt 1024 ]
do
sleep 20
openfile=`lsof -u tz|wc -l`
echo open files after tezos started $openfile
substr=`expr $openfile - $initopenfile`
echo diff $substr
done
echo killing tezos
kill -9 $mypid
sleep 20
done

#91

By the way, does anyone of you know how can I get some tezos?
I own no tezos at all, and I would like to earn some.
Can I earn tezos by staking with my node?


#92

You’ll have to buy Tezos for this. Proof of stake requires a stake in Tezos to be able to bake.


#93

a new erroc occured

Jan 14 10:55:52 - p2p.maintenance: Too few connections (8)
Jan 14 10:55:52 - core: maintenance worker failed with Unix.Unix_error(Unix.EMFILE, "socket", "")

#94

I avoid to buy cryptocurrencies using fiat money. I prefer to earn it with my work. Although I own fiat money (enough to feed myself), I consider it as bloody money and I avoid to use it. I consider also cryptomoney as a social convention, thats why I want this money to be given to me by the community (as compensation for the work I offer to the cryptocommunity).

I don’t like people who buy cryptocurrencies, because that way they gain power over the community by using the bloody fiat money.

I own 1.5 dash and 1.6 pivx.
Would anyone want to change them with one tezos?


#95

this is a lwt error

[tz src]$ grep -rn "worker failed" *
utils/lwt_utils.ml:251:    log_error "%s worker failed with %s" name (Printexc.to_string e) ;

I reported it as a issue

By the way my script worked!!!

The lwt error caused the total files to increase (socket files I think), so my scirpt killed the process of tezos-node!

So I dont have to reboot my pi3 or restart manully my tezos-node.


#96

@demo, the script is a great idea, I still hope to find a way to control the number of open files and avoid restarting the node, but if that’s not possible then your script is the way to go, well done!

The unix error “unix.EMFILE, socket…” occurred because you reached the max open files for the process. I think the “lwt” error is just the fallout of reaching the file limit. Perhaps you want to setup the script to stop the node before it crushes. You may also want to increase the max limit in the “limits.conf” file to reduce the number of times the node has to restart.
When you increase the max open file limit, check the RAM mem usage because at some point it will reach the limit (1GB). One way to essentially double the RAM on the RPI3 is to use the ZRAM service which is already available in the Fedora distribution but it is not enabled by default. To enable on boot edit the file “zram.service” located in “/user/lib/systemd/system” and add the following two lines at the end of the file:

[Install]
WantedBy=multi-user.target

then type systemctl enable zram and systemctl start zram. If you type `free -m’ you’ll see that the swap is now 1243 instead of the original 243. ZRAM acts as a swap memory but it never writes on the SD card. Data are compressed in the RAM on the fly. It takes some of the CPU resources but way way better than hitting the true swap memory.

Thanks for sharing the output of ‘free -m’ and ‘ss -s’, please let me also know the time after you started the node at which the output was created, those numbers should get bigger and bigger with time.


#97

Instead of changing the default setup of the operating system, I think the following code from lwt_utils.ml should change.

(* A worker launcher, takes a cancel callback to call upon *)
let worker name ~run ~cancel =
  let stop = LC.create () in
  let fail e =
    log_error "%s worker failed with %s" name (Printexc.to_string e) ;
    cancel ()
  in
  let waiter = LC.wait stop in
  log_info "%s worker started" name ;
  Lwt.async
    (fun () ->
       Lwt.catch run fail >>= fun () ->
       LC.signal stop ();
       Lwt.return ()) ;
  waiter >>= fun () ->
  log_info "%s worker ended" name ;
Lwt.return ()

In order we fix this, we have to understand what the keyword “in” means.




#98

@demo, I can pretty much guarantee that the “Unix.Unix_error(Unix.EMFILE…” was the result of the node exceeding the max open file threshold


#99

Yes of course it was,

But these open files are also SOCKETS. Aka connections. Tezos code creates a lot of socket connections. So the code of tezos should deal with it, and prevent this, somehow. Maybe the ocaml program should do some special system calls, somehow, before deciding to open a new socket connection.

Or not?

Do you think the problem is in the fIles and not in the connections?
I noticed that tezos-node creates also a lot of .ldb files.
So maybe the socket error I encounter was just the drop that overflowed the glass.


#100

@demo, you are absolutely right to the point, I don’t know if it is something related to the Tezos code or the RPI3 hardware/software, it is strange to me that users with big PC never reported any issue of this kind.
My tiny effort at the moment has been to fool around with the kernel setup trying to keep fewer open ports at one time or at least trying to find out whether it is related to the RPI3.
I am sure we’ll figure something out…