Skip to content
Snippets Groups Projects
Commit 496596ad authored by project_12982_bot's avatar project_12982_bot
Browse files

automatic push at 2021-12-10-0032

parent d1277049
No related branches found
No related tags found
No related merge requests found
......@@ -10,7 +10,7 @@ I started the script making a date file, so I can use the same date in every scr
I scrape the url to the news article individually, because the different websites have different html structure. The scrape script curl tv2.no/sport on only even numbers. I used a if statment to check if it is possible to divide the day with 2 to see if it is a even number. I used 10# to ensure that bash interpret the number with base 10 instead of base 8.
It loops inside every url in the three different variables and curls it. Then it fetches the title, image url and the summary, which is a optional task, and stores it in different variables. Since I fetch the information inside head all three websites have nearly the same html structure. If not I have fixed it with help of regex. Therefore I can use the same variables to fetch from all three websites. Just to have some control when we scrape from different websites I chose to name the files after the websites and start every title with it. To do this I made a if statement that sees if the article is from tv2.no/nyheter, nrk.no/innlandet or tv2.no/sport. Then the loop makes a txt file and fetches all the information from different variables. This information is the url of the news article, the title, the url of the image, the date it was scraped and a summary of the news article.
It loops inside every url in the three different variables and curls it. Then it fetches the title, image url and the summary, which is a optional task, and stores it in different variables. Since I fetch the information inside the html tag head all three websites have nearly the same html structure. If not I have fixed it with help of regex. Therefore I can use the same variables to fetch from all three websites. Just to have some control when we scrape from different websites I chose to name the files after the websites and start every title with it. To do this I made a if statement that sees if the article is from tv2.no/nyheter, nrk.no/innlandet or tv2.no/sport. Then the loop makes a txt file and fetches all the information from different variables. This information is the url of the news article, the title, the url of the image, the date it was scraped and a summary of the news article.
Finally the script ends with removing all the temporary files.
......@@ -104,10 +104,12 @@ In this script I got much help from this website:
<h2>gitrepo.sh</h2>
The gitrepo.sh script is an optional task and updates the local and the centralized git repository. Inside this script I run the git commands to push all the changes that have been done after running the other scripts. This includes the git command git add, git commit and git push. To add all the changes I wrote a dot to get all the changed and new files. In the commit message I included the date to see when it was pushed automatically. I also included that it is an automatic push so it is possible to see if the commit is pushed manually or automatically. Since this script also runs automatic by systemd every 6th hour it requires a ssh key or a token, if not it asks for the username and the password for the gitlab. I chose to use a token, because of security. The ssh key has an option of making a passphrase, but then the user have to write it down everytime it runs which is pointless. It is possible to not give the ssh key a passprase, but this is not a secure way of doing it. Therefore I made a token in gitlab to clone the project with https. I made a project token so the token can just be used with this project only. I gave the token the role maintainer and the scope api. Then cloned the project inside home/pi/git:
The gitrepo.sh script is an optional task and updates the local and the centralized git repository. Inside this script I run the git commands to push all the changes that have been done after running the other scripts. This includes the git command git add, git commit and git push. To add all the changes I wrote a dot to get all the changed and new files. In the commit message I included the date to see when it was pushed automatically. I also included that it is an automatic push so it is possible to see if the commit is pushed manually or automatically. Since this script also runs automatic by systemd every 6th hour it requires a ssh key or a token, if not it asks for the username and the password for the gitlab. I chose to use a token, because of security. The ssh key has an option of making a passphrase, but then the user have to write it down everytime it runs which is pointless. It is possible to not give the ssh key a passprase, but this is not a secure way of doing it. Therefore I made a token in gitlab to clone the project with https. I made a project token so the token can just be used with this project only. Since I expose the token inside the scripts it is more appropriate to use a project token than a personal access token. This is because the personal access token gives the user more power considering it applies to your whole account. I gave the token the role maintainer and the scope api. This means that the token have complete read and write access to the scoped project api. To use the token you have to clone the project inside ~/git:
git clone https://exam:ZqPZxzPx4fCo3GXtTpHJ@gitlab.stud.idi.ntnu.no/ingring/exam.git
When the project is clone with a token, it will not ask for any username or password when it pull and push.
<h2>deployment.sh</h2>
Finally I made the deployment script. The deployment.sh is an optional script and will configurate a blank installation of raspberry pi os with all that is necessary for the project to work. Firstly the script installs all the programs that I have used to solve this assignment. Then making a git directory inside home/pi and clone the git repository with https inside this directory. This will make an exam directory with all the scripts inside as well as one config folder with all the configuration files. Then the script will copy the configuration files and the overview.cgi and place it inside the right directory as well as enable and start it. Finally available all the scripts with the chmod +x command. However you have to manually enable the deployment.sh before you run it:
......
date=$(date +"%Y-%m-%d-%H%M")
git pull
git add .
git commit -m "automatic push at ${date}"
git push
\ No newline at end of file
#!/bin/bash
cd ~/git/exam
#in case the local repository is not up to date
git pull
#runs the scripts consecutively since they depend on each other
./scrape.sh
./page.sh
./overview.sh
......@@ -18,5 +22,5 @@ rm -r newsscraping
rm htmlfiles/htmllist.txt
#push everything to git with a project token
#push every changes to git
./gitrepo.sh
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment