Skip to content
Snippets Groups Projects
Commit 7138df6a authored by project_12982_bot's avatar project_12982_bot
Browse files

automatic push at 2021-12-08-2017

parent 4bd14fed
No related branches found
No related tags found
No related merge requests found
Showing
with 171 additions and 25 deletions
......@@ -8,15 +8,15 @@ The scrape.sh script fetches news from online newspapers. Since I have done all
I started the script making a date file, so I can use the same date in every script that runs after the scrapescript. It overwrites the data inside, so it contains only one line which is todays date. Then I avoid individually fetching the date in the start of every script. The problem with individually fetching the date in every script is that it fetches the minutes and then the minute could have been changed before every scripts finished running. Then it can not fetch the same date that the first script scraped. I also use the date to make a folder where I store all the txt files with the information of the individual news article.
The script curls tv2.no/sport only on even numbers. I used a if statment to check if it is possible to divide the day with 2 to see if it is a even number. I used 10# to ensure that bash interpret the number with base 10 instead of base 8.
I scrape the url to the news article individually, because the different websites have different html structure. The scrape script curl tv2.no/sport on only even numbers. I used a if statment to check if it is possible to divide the day with 2 to see if it is a even number. I used 10# to ensure that bash interpret the number with base 10 instead of base 8.
It loops inside every url in the three different variables and curls it. Then it fetches the title, image url and the summary, which is a optional task, and stores it in three different variables. Just to have some control when we scrape from different websites I chose to name the files after the websites and start every title with it. To do this I made a if statement that sees if the article is from tv2.no/nyheter, nrk.no/innlandet or tv2.no/sport. Then the loop makes a txt file and fetches all the information from different variables. This information is the url of the news article, the title, the url of the image, the date it was scraped and a summary of the news article.
It loops inside every url in the three different variables and curls it. Then it fetches the title, image url and the summary, which is a optional task, and stores it in three different variables. Since I fetch the information inside head all three websites have nearly the same html structure. If not I have fixed it with help of regex. Therefore I can use the same variables to fetch from all three websites. Just to have some control when we scrape from different websites I chose to name the files after the websites and start every title with it. To do this I made a if statement that sees if the article is from tv2.no/nyheter, nrk.no/innlandet or tv2.no/sport. Then the loop makes a txt file and fetches all the information from different variables. This information is the url of the news article, the title, the url of the image, the date it was scraped and a summary of the news article.
Finally the script ends with removing all the temporary files.
<h2>page.sh</h2>
The page.sh script takes the txt files that where made in the scrape.sh script and makes a individual html page to each of them. It loops inside every files that are placed inside the directory that matches the date, in the date.txt file, inside the newsscraping directory. It fetches the information from the text files and stores it in variables. Then it makes a html file for each txt file and stores it inside a directory based on the date inside the htmlfiles directory.
The page.sh script takes the txt files that where made in the scrape.sh script and makes a individual html page to each of them. It loops inside every files that are placed inside the directory that matches the date, in the date.txt file, inside the newsscraping directory. It fetches the information from the text files and stores it in variables. Then it makes a html file for each txt file and stores it inside a directory based on the date. This folder is inside the htmlfiles.
<h2>overview.sh</h2>
......@@ -24,20 +24,20 @@ The overview.sh script makes a html file, index.html, that links to every news a
<h2>main.sh</h2>
This is the main script which runs the scrape.sh, page.sh, overview.sh and the gitrepo.sh script. This is also the script that systemd runs every 6th hour.
This is the main script which runs the scrape.sh, page.sh, overview.sh and gitrepo.sh script. This is also the script that systemd runs every 6th hour.
<h2>nginx</h2>
This configuration file serves the generated HTML files via an nginx configuration file. I changed the default file inside: /etc/nginx/sites-available. I have made a copy of that file and stored it inside the config folder inside this project directory.
Inside the default file I changed the root to my project directory:
This configuration file serves the generated HTML files via an nginx configuration file. I changed the default file inside: /etc/nginx/sites-available. I have made a copy of that file and stored it inside the config folder inside this project directory. I called the file default. Inside the default file I changed the root to my project directory:
root /home/pi/git/exam;
This configuration file made it possible to see the overview page with this url: http://idg1100-ingring.dynv6.net/
I have made a dynamic DNS by installing ddclient so I can use idg1100-ingring.dynv6.net as my hostname. This is not required to to this assignment, therefore I have not included the installation inside the deployment.sh script. The hostname can also be the ip address of the raspberry pi.
I tried to do the optional task and set up a fcgi instead of nginx, but I did not quite manage to make it work. The cgi where suppose to generate the overview page on the fly via fcgi wrap
I did not figure out how the cgi script should look, however, I got the page up and running via fcgi wrap. It can be reached with this link: http://idg1100-ingring.dynv6.net/cgi-bin/overview.cgi
My thought was that I could reach the html files from the /home/pi/git/exam/htmlfiles directory, however, it did not quite seem that it is possible. Then I wanted to run the same for loop as I did in the overview page, store it in a variable, reverse the variable via the sort command. Then echo the reversed variable.
My thought was that I could reach the html files from the /home/pi/git/exam/htmlfiles directory, however, it did not quite seem that it is possible. Then I wanted to run the same for loop as I did in the overview page, store it in a variable, reverse the variable via the sort command an then echo the reversed variable.
Since I tried to do this optional task, I changed the default file so it also includes fcgiwrap.conf inside the server section:
include /etc/nginx/fcgiwrap.conf;
......@@ -47,9 +47,9 @@ In this script I got much help from this two websites:
<h2>Crontab entries (systemd)</h2>
I did the optional task which means that I created a systemd timer unit instead of a crontab entry. A systemd timer unit includes two files, schedule-exam.service and schedule-exam.timer, which haves to be inside /etc/systemd/user. I did not make the optional deployment script, therefore you have to copy (cp) these files and make the files inside the right directory.
I did the optional task which means that I created a systemd timer unit instead of a crontab entry. A systemd timer unit includes two files, schedule-exam.service and schedule-exam.timer, which haves to be inside /etc/systemd/user. For now it is stored inside the the config folder. I have made the optional deployment script which copy (cp) these files from the folder and makes the files inside the right directory. The schedule-exam.service file describes the job the schedule-exam.timer schould do. In other words, the schedule-exam.service contains which script it should run and the schedule-exam.timer contains when it should run.
First I schedule the timer with this command:
First I tried to schedule the timer with this command:
OnUnitActiveSec=6h
It worked well until I disconnected my raspberry pi and reconnected to it. The systemd would not work until I start the timer again with the command:
systemctl --user start schedule-exam.timer
......@@ -57,19 +57,19 @@ I discored this when I run this command:
systemctl --user status schedule-exam.timer
Then the output showed that the trigger was not applicable:
Trigger: n/a
The solution could be run the start command before every systemd operation. However, instead I switched to make the timer run on the same time with 6 hour apart. I chose to run the systemd when the clock shows 0, 6, 12 and 18 as well as 120 seconds after a reboot. This may be a more static way of running the script with six hour apart, but it gives us predictability. Therefore I changed the timer command to:
The solution could be running the start command before every systemd operation. However, instead I switched to make the timer run on the same time with 6 hour apart. I chose to run the systemd when the clock shows 0, 6, 12 and 18 as well as 120 seconds after a reboot. This may be a more static way of running the script with six hour apart, but it gives us predictability. Therefore I changed the timer command to:
OnCalendar=*-*-* 00,06,12,18:00:00
To manually stop the timer:
To manually stop the timer you have to write this inside the terminal in /etc/systemd/user:
systemctl --user stop schedule.timer
In this script I got much help from this website: https://fedoramagazine.org/systemd-timers-for-scheduling-tasks/
<h2>gitrepo.sh</h2>
Updates the local and the centralized git repository. Since this script also runs automatic by systemd every 6th hour it requires a ssh key or a token, if not it would not push without asking the user for the username and the password for the gitlab.
The gitrepo.sh script is an optional task and updates the local and the centralized git repository. Inside this script I run the git commands to push all the changes that have been done after running the other scripts. This includes the git command git add, git commit and git push. To add all the changes I wrote a dot to get all the changed and new files. In the commit message I included the date to see when it was pushed automatically. I also included that it is an automatic push so it is possible to see which commits is pushed manually or automatically. Since this script also runs automatic by systemd every 6th hour it requires a ssh key or a token, if not it would not push without asking the user for the username and the password for the gitlab.
I made a ssh key in the terminal, copy the key into gitlab and configurated the user in the terminal:
I made a ssh key in the terminal, copied the key into gitlab and configurated the user in the terminal:
ssh-keygen -t rsa -b 2048 -C "ingring@stud.ntnu.no"
ssh -T git@gitlab.stud.idi.ntnu.no
git config --global user.name "ingring"
......@@ -82,7 +82,6 @@ To let systemd run this git repo script I did not give the ssh key a passphrase.
I saved the token inside the project so it does not have to write the username and the token everytime it push and pull the project, just the first time after using this command:
git config --global credential.helper cache
jeg prøver igjen
<h2>Deployment.sh</h2>
The deployment.sh is a optional script and will configurate a blank installation of raspberry pi os with all that is necessary for the project to work. Firstly the script installs all the programs that I have used to solve this assignment. Then making a git directory inside home/pi and clone the git repository with https inside this directory. This will make a exam directory with all the scripts inside and one config folder with all the configuration files. Then the script will copy the configuration files and the overview.cgi and place it inside the right directory as well as enable and start it. Finally available all the scripts with the chmod +x command.
\ No newline at end of file
[Unit]
Description=A job to run the main.sh script which scrape websites, make individual html pages and a overview page automatic
Description=A job to run the main.sh script
[Service]
Type=simple
......
[Unit]
Description=Schedule to run the main script every 6th hour
#Allow manual starts and stops
#Allows to start and stop the job manually
RefuseManualStart=no
RefuseManualStop=no
[Timer]
#Execute job if it missed a run due to machine being off
#Execute the job if it misses a run due to the machine being off
Persistent=true
#Run 120 seconds after boot for the first time
#Run the job 120 seconds after rebooting
OnBootSec=120
#Run every 6 hours thereafter
OnUnitActiveSec=6h
#Run the job when the time shows 0, 6, 12 and 18 every day
OnCalendar=*-*-* 00,06,12,18:00:00
#File describing job to execute
#This file describes the job to execute
Unit=schedule-exam.service
[Install]
......
date=$(cat newsscraping/date.txt)
date=$(date +"%Y-%m-%d-%H%M")
git add .
git commit -m "automatic upload at ${date}"
git commit -m "automatic push at ${date}"
git push
\ No newline at end of file
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>tv2nyheter: I juni twitret smitteverneksperten: «Det var den pandemien». Dette sier han nå</title>
</head>
<body>
<h1>tv2nyheter: I juni twitret smitteverneksperten: «Det var den pandemien». Dette sier han nå</h1>
<img src="https://www.cdn.tv2.no/images/14256731.jpg?imageId=14256731&panow=100&panoh=100&panox=0&panoy=0&heightw=100&heighth=100&heightx=0&heighty=0&width=600&height=315">
<p>Fetched on 2021-12-08-2017 <a href="https://www.tv2.no/nyheter/14413909/">Original article</a></p>
<a href="/index.html">Go back to overview page</a>
</body>
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>tv2nyheter: 19-åring omkom i ATV-ulykke</title>
</head>
<body>
<h1>tv2nyheter: 19-åring omkom i ATV-ulykke</h1>
<img src="https://www.tv2.no/view-resources/baseview/public/common/lab_assets/img/logo/tv2-default.jpg">
<p>Fetched on 2021-12-08-2017 <a href="https://www.tv2.no/nyheter/14415585/">Original article</a></p>
<a href="/index.html">Go back to overview page</a>
</body>
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>tv2nyheter: Person alvorlig skadd i påkjørsel i Ålesund</title>
</head>
<body>
<h1>tv2nyheter: Person alvorlig skadd i påkjørsel i Ålesund</h1>
<img src="https://www.tv2.no/view-resources/baseview/public/common/lab_assets/img/logo/tv2-default.jpg">
<p>Fetched on 2021-12-08-2017 <a href="https://www.tv2.no/nyheter/14415846/">Original article</a></p>
<a href="/index.html">Go back to overview page</a>
</body>
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>nrkinnlandet: Fylkestinget i Innlandet stemte for folkeavstemming om Hedmark og Oppland </title>
</head>
<body>
<h1>nrkinnlandet: Fylkestinget i Innlandet stemte for folkeavstemming om Hedmark og Oppland </h1>
<img src="https://gfx.nrk.no/t80cKAXq_UHFL1n_K128rA5H4KtOqj51zpFSQimf9WjQ.jpg">
<p>Fetched on 2021-12-08-2017 <a href="https://www.nrk.no/innlandet/fylkestinget-i-innlandet-stemte-for-folkeavstemming-om-hedmark-og-oppland-1.15762396">Original article</a></p>
<a href="/index.html">Go back to overview page</a>
</body>
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>nrkinnlandet: Ny rapport om brannsikkerheit i leilegheitsbygget Pernup i Trysil </title>
</head>
<body>
<h1>nrkinnlandet: Ny rapport om brannsikkerheit i leilegheitsbygget Pernup i Trysil </h1>
<img src="https://gfx.nrk.no/-f4vDhk3yI_lOhiLLOkvUgCJ5eDQW7UT4orUoThMDRQw.jpg">
<p>Fetched on 2021-12-08-2017 <a href="https://www.nrk.no/innlandet/ny-rapport-om-brannsikkerheit-i-leilegheitsbygget-pernup-i-trysil-1.15741997">Original article</a></p>
<a href="/index.html">Go back to overview page</a>
</body>
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>nrkinnlandet: Hvem vinner Klassequizen i Innlandet? </title>
</head>
<body>
<h1>nrkinnlandet: Hvem vinner Klassequizen i Innlandet? </h1>
<img src="https://gfx.nrk.no/eF_qcLsWg0lWXNt6mHh1CApAf86dvb4-dsgn9CEUPJeQ.jpg">
<p>Fetched on 2021-12-08-2017 <a href="https://www.nrk.no/innlandet/hvem-vinner-klassequizen-i-innlandet_-1.15658127">Original article</a></p>
<a href="/index.html">Go back to overview page</a>
</body>
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>tv2sport: Hangeland: - Erling går utenpå Rooney, Suarez og Drogba</title>
</head>
<body>
<h1>tv2sport: Hangeland: - Erling går utenpå Rooney, Suarez og Drogba</h1>
<img src="https://www.cdn.tv2.no/images/14414656.jpg?imageId=14414656&panow=100&panoh=84.441301272984&panox=0&panoy=0&heightw=100&heighth=84.441301272984&heightx=0&heighty=0&width=600&height=315">
<p>Fetched on 2021-12-08-2017 <a href="https://www.tv2.no/sport/14414425/">Original article</a></p>
<a href="/index.html">Go back to overview page</a>
</body>
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>tv2sport: CL-kometen bodde på gaten: - En del av livet jeg ikke trenger å vite noe om</title>
</head>
<body>
<h1>tv2sport: CL-kometen bodde på gaten: - En del av livet jeg ikke trenger å vite noe om</h1>
<img src="https://www.cdn.tv2.no/images/14409422.jpg?imageId=14409422&panow=100&panoh=50.993377483444&panox=0&panoy=6.6225165562914&heightw=40.928270042194&heighth=100&heightx=55.274261603376&heighty=0&width=600&height=315">
<p>Fetched on 2021-12-08-2017 <a href="https://www.tv2.no/sport/14409361/">Original article</a></p>
<a href="/index.html">Go back to overview page</a>
</body>
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>tv2sport: Dramatisk VM-økning: - Det forteller egentlig alt</title>
</head>
<body>
<h1>tv2sport: Dramatisk VM-økning: - Det forteller egentlig alt</h1>
<img src="https://www.cdn.tv2.no/images/14407913.jpg?imageId=14407913&panow=72.20447284345&panoh=36.666666666667&panox=14.05750798722&panoy=19.52380952381&heightw=40.928270042194&heighth=100&heightx=21.940928270042&heighty=0&width=600&height=315">
<p>Fetched on 2021-12-08-2017 <a href="https://www.tv2.no/sport/14414078/">Original article</a></p>
<a href="/index.html">Go back to overview page</a>
</body>
......@@ -4,6 +4,15 @@
<body>
<h1>Overview of recent news</h1>
<ul>
<li><a href='htmlfiles/2021-12-08-2017/news9-tv2sport.html'>tv2sport: Dramatisk VM-økning: - Det forteller egentlig alt</li>
<li><a href='htmlfiles/2021-12-08-2017/news8-tv2sport.html'>tv2sport: CL-kometen bodde på gaten: - En del av livet jeg ikke trenger å vite noe om</li>
<li><a href='htmlfiles/2021-12-08-2017/news7-tv2sport.html'>tv2sport: Hangeland: - Erling går utenpå Rooney, Suarez og Drogba</li>
<li><a href='htmlfiles/2021-12-08-2017/news6-nrkinnlandet.html'>nrkinnlandet: Hvem vinner Klassequizen i Innlandet? </li>
<li><a href='htmlfiles/2021-12-08-2017/news5-nrkinnlandet.html'>nrkinnlandet: Ny rapport om brannsikkerheit i leilegheitsbygget Pernup i Trysil </li>
<li><a href='htmlfiles/2021-12-08-2017/news4-nrkinnlandet.html'>nrkinnlandet: Fylkestinget i Innlandet stemte for folkeavstemming om Hedmark og Oppland </li>
<li><a href='htmlfiles/2021-12-08-2017/news3-tv2nyheter.html'>tv2nyheter: Person alvorlig skadd i påkjørsel i Ålesund</li>
<li><a href='htmlfiles/2021-12-08-2017/news2-tv2nyheter.html'>tv2nyheter: 19-åring omkom i ATV-ulykke</li>
<li><a href='htmlfiles/2021-12-08-2017/news1-tv2nyheter.html'>tv2nyheter: I juni twitret smitteverneksperten: «Det var den pandemien». Dette sier han nå</li>
<li><a href='htmlfiles/2021-12-08-1507/news9-tv2sport.html'>tv2sport: Hvor er de kvinnelige superstjernene?</li>
<li><a href='htmlfiles/2021-12-08-1507/news8-tv2sport.html'>tv2sport: Savner kvinnene i «verdensidrettene»: - Ikke like forhold for kvinner og menn</li>
<li><a href='htmlfiles/2021-12-08-1507/news7-tv2sport.html'>tv2sport: Tottenham bekrefter stort smitteutbrudd</li>
......
https://www.tv2.no/nyheter/14413909/
tv2nyheter: I juni twitret smitteverneksperten: «Det var den pandemien». Dette sier han nå
https://www.cdn.tv2.no/images/14256731.jpg?imageId=14256731&panow=100&panoh=100&panox=0&panoy=0&heightw=100&heighth=100&heightx=0&heighty=0&width=600&height=315
2021-12-08-2017
FHIs smittevernekspert Preben Aavitsland innrømmer at han tok feil om koronapandemien.
https://www.tv2.no/nyheter/14415585/
tv2nyheter: 19-åring omkom i ATV-ulykke
https://www.tv2.no/view-resources/baseview/public/common/lab_assets/img/logo/tv2-default.jpg
2021-12-08-2017
En 19 år gammel mann omkom onsdag kveld i en ATV-ulykke på Årnes.
https://www.tv2.no/nyheter/14415846/
tv2nyheter: Person alvorlig skadd i påkjørsel i Ålesund
https://www.tv2.no/view-resources/baseview/public/common/lab_assets/img/logo/tv2-default.jpg
2021-12-08-2017
To personer er skadd, en av dem alvorlig, etter å ha blitt påkjørt i Brattvåg i Ålesund onsdag.
https://www.nrk.no/innlandet/fylkestinget-i-innlandet-stemte-for-folkeavstemming-om-hedmark-og-oppland-1.15762396
nrkinnlandet: Fylkestinget i Innlandet stemte for folkeavstemming om Hedmark og Oppland
https://gfx.nrk.no/t80cKAXq_UHFL1n_K128rA5H4KtOqj51zpFSQimf9WjQ.jpg
2021-12-08-2017
Erik S. Winther gjekk imot eiget parti på Fylkestinget, og blei dermed avgjerande for at innbyggarane nå får stemme om Hedmark og Oppland skal oppstå på ny.
https://www.nrk.no/innlandet/ny-rapport-om-brannsikkerheit-i-leilegheitsbygget-pernup-i-trysil-1.15741997
nrkinnlandet: Ny rapport om brannsikkerheit i leilegheitsbygget Pernup i Trysil
https://gfx.nrk.no/-f4vDhk3yI_lOhiLLOkvUgCJ5eDQW7UT4orUoThMDRQw.jpg
2021-12-08-2017
Ikkje nødvendig å stenge bygget for bruk, seier rapporten som utbyggjar Trybo AS baserer klaga si på.
https://www.nrk.no/innlandet/hvem-vinner-klassequizen-i-innlandet_-1.15658127
nrkinnlandet: Hvem vinner Klassequizen i Innlandet?
https://gfx.nrk.no/eF_qcLsWg0lWXNt6mHh1CApAf86dvb4-dsgn9CEUPJeQ.jpg
2021-12-08-2017
69 skoler i Innlandet skal kjempe om å vinne årets Klassequiz, det er ny rekord! Her kan du se resultater og sendetider.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment