web site hit counter Agile Data Science 2.0: Building Full-Stack Data Analytics Applications with Spark - Ebooks PDF Online
Hot Best Seller

Agile Data Science 2.0: Building Full-Stack Data Analytics Applications with Spark

Availability: Ready to download

Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they're to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Ka Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they're to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools. Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. You'll learn an iterative approach that lets you quickly change the kind of analysis you're doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization. Build value from your data in a series of agile sprints, using the data-value pyramid Extract features for statistical models from a single dataset Visualize data with charts, and expose different aspects through interactive reports Use historical data to predict the future via classification and regression Translate predictions into actions Get feedback from users after each sprint to keep your project on track


Compare

Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they're to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Ka Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they're to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools. Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. You'll learn an iterative approach that lets you quickly change the kind of analysis you're doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization. Build value from your data in a series of agile sprints, using the data-value pyramid Extract features for statistical models from a single dataset Visualize data with charts, and expose different aspects through interactive reports Use historical data to predict the future via classification and regression Translate predictions into actions Get feedback from users after each sprint to keep your project on track

30 review for Agile Data Science 2.0: Building Full-Stack Data Analytics Applications with Spark

  1. 4 out of 5

    Kristian Edlund

    I really wanted to love this book. The concept of walking through a quite elaborate example is excellent. I think it shows many of the pit falls and iterations you need to do from start to finished data product. However, as one of the other reviews also points out, the code example in there are unfinished and the instructions are hard to follow. For me it started in chapter 4, where you are asked to run the first piece of code to convert data. However, I wasn't sure where to run it and it took m I really wanted to love this book. The concept of walking through a quite elaborate example is excellent. I think it shows many of the pit falls and iterations you need to do from start to finished data product. However, as one of the other reviews also points out, the code example in there are unfinished and the instructions are hard to follow. For me it started in chapter 4, where you are asked to run the first piece of code to convert data. However, I wasn't sure where to run it and it took me a while to figure out it was supposed to be executed in Spark. Probably just me being slow. Once I booted spark up it would still not run, as there was a piece of missing initialization data which I only found once I looked in another branch on git. That was just the beginning. Once I got to connecting Flask and Mongo it really went downhill as the versions weren't compatible. So if the code gets some polish and a ready to use image for Vagrant and EC2 that would work, then it will be a 5-star book.

  2. 4 out of 5

    Netoasis

    It's a good book for programmer or data science beginner level to know the data science concept and popular tool-set in the industry. However the book is not up-to-date to keep with the latest software version. Thus some of the codes in the book are not working as expected(Too bad..). So it means the quality of the book is under the standard. For example, it's using pyElasticsearch package however it's only support ElasticSearch (<2.0) version. It's an ok book and needs more polished in my opinion It's a good book for programmer or data science beginner level to know the data science concept and popular tool-set in the industry. However the book is not up-to-date to keep with the latest software version. Thus some of the codes in the book are not working as expected(Too bad..). So it means the quality of the book is under the standard. For example, it's using pyElasticsearch package however it's only support ElasticSearch (<2.0) version. It's an ok book and needs more polished in my opinion.

  3. 4 out of 5

    Rebecca Bilbro

    Favorite quotes: - "In data science, by contrast to software engineering, code shouldn't always be good; it should be eventually good." - "In Agile Data Science, we value generalists over specialists...Examples of good Agile Data Science team members include: Designers who deliver working CSS; Web developers who build entire applications and understand the user interface and user experience; Data scientists capable of both research and building web services and applications; Researchers who check Favorite quotes: - "In data science, by contrast to software engineering, code shouldn't always be good; it should be eventually good." - "In Agile Data Science, we value generalists over specialists...Examples of good Agile Data Science team members include: Designers who deliver working CSS; Web developers who build entire applications and understand the user interface and user experience; Data scientists capable of both research and building web services and applications; Researchers who check in working source code, explain results, and share intermediate data; Product managers able to understand the nuances in all areas" - "In data products, the data is ruthlessly opinionated. Whatever we wish the data to say, it is unconcerned with our own opinions. It says what it says. This means the waterfall model has no application. It also means that mocks are an insufficient blueprint to establish consensus in software teams." - "Extracted features from unstructured data get cleaned only in the harsh light of day, as users consume them and complain; if you can't ship your features as you extract them, you're in a state of free fall. The hardest part of building data products is pegging entity and feature extraction to products smaller than your ultimate vision. This is why schemas must start as blobs of unstructured text and evolve into structured data only as features are extracted. Features must be exposed in some product form as they are created, or they will never achieve a product-ready state. Derived data that lives in the basement of your product is unlikely to shape up. It is better to create entity pages to bring entities up to a "consumer-grade" form, to incrementally improve these entities, and to progressively combine them than to try to expose myriad derived data in a grand vision from the get-go." - "Rare is the chart that tells a story. This is because most people make a chart and move on… when in reality, you have to iteratively create and improve charts to achieve useful visualizations... You can create charts in an ad hoc way at first, but as you progress, your workflow should become increasingly automated and reproducible." -"Agile Data Science is an approach to data science centered around web application development. It asserts that the most effective output of the data science process suitable for effecting change in an organization is the web application. It asserts that application development is a fundamental skill of a data scientist. Therefore, doing data science becomes about building applications that describe the applied research process: rapid prototyping, exploratory data analysis, interactive visualization, and applied machine learning."

  4. 4 out of 5

    Alex Galea

    This book describes Russell's perspectives on good data science workflow using an agile methodology. He walks through a project about airline flight data in great detail and shows off some really neat tricks for building web apps and doing predictive analytics at scale. I would describe the material at intermediate level, where the reader should already be familiar with the data science ecosystem. I loved chapter 2, which introduces the technology stack. It's awesome to see minimal working snippe This book describes Russell's perspectives on good data science workflow using an agile methodology. He walks through a project about airline flight data in great detail and shows off some really neat tricks for building web apps and doing predictive analytics at scale. I would describe the material at intermediate level, where the reader should already be familiar with the data science ecosystem. I loved chapter 2, which introduces the technology stack. It's awesome to see minimal working snippets from a whole lineup of open source tools that comprise Russell's pipeline. In particular the Ariflow section her is quite nice. In the remainder of the book, we see how to pull these technologies together. As others have mentioned, the code is not 100% plug and play. This is hardly surprising given the quickly evolving nature of open source, and particularly how new his tools are. Sure, you could pick up a book on MySQL and run most of the code without issue, but Russell is working with much newer (and frankly more interesting) technologies. From my perspective, I am not running any code from the book but just reading through and noting code that will make great reference later. A couple issues that did bother me was the occasional typo, repeated code block or missing attention to detail in presentation. But don't let that stop you from checking out this excellent book.

  5. 5 out of 5

    Kyle Dinges

    I'm not sure anyone comes to Goodreads for textbook reviews, but I read it so I'm reviewing it briefly... 3.5 stars. It's a good use-case for getting a web app up and running that includes a machine learning model and that scales easily. The title isn't kidding when it says full-stack, the case here leverages: MongoDB, ElasticSearch, Kafka, Airflow, numerous Python (Flask, sci-kit learn, etc...) libraries, and more all with Spark (through Pyspark) as the primary engine. It's probably most useful I'm not sure anyone comes to Goodreads for textbook reviews, but I read it so I'm reviewing it briefly... 3.5 stars. It's a good use-case for getting a web app up and running that includes a machine learning model and that scales easily. The title isn't kidding when it says full-stack, the case here leverages: MongoDB, ElasticSearch, Kafka, Airflow, numerous Python (Flask, sci-kit learn, etc...) libraries, and more all with Spark (through Pyspark) as the primary engine. It's probably most useful as a primer on building a Spark based machine learning model and, most importantly, deploying it. I'd say it requires a cursory-to-intermediate understanding of most of the technologies included. It's getting a bit long in the tooth now that it's almost 5 years old, but if you know enough to follow along, you probably know where any potential deprecations lie. I thought it was helpful for those with an intermediate Data Science background.

  6. 5 out of 5

    Jose Manuel

    impresionante. Pese a ser R mi opción principal y este libro usar Python, su enfoque , centrándose en la parte "científica" de la labor del Data Scientist es claramente acertada. Los primeros capítulos describen mi día a día de forma tan acertada que me ha llegado a emocionar. Su enfoque de mantener las cosas tan sencillas y escalables como sea posible centrándonos en las personas más que en los procesos, liberando resultados de manera rápida y continuada a lo largo del proceso, son consejos que impresionante. Pese a ser R mi opción principal y este libro usar Python, su enfoque , centrándose en la parte "científica" de la labor del Data Scientist es claramente acertada. Los primeros capítulos describen mi día a día de forma tan acertada que me ha llegado a emocionar. Su enfoque de mantener las cosas tan sencillas y escalables como sea posible centrándonos en las personas más que en los procesos, liberando resultados de manera rápida y continuada a lo largo del proceso, son consejos que aplico en mi día a día y recomiendo a cualquier persona que se dedique a esto.

  7. 4 out of 5

    Joe

    Not bad at illustrating the concepts, but a bit too specific for the technology stack that was mentioned in the book. I thought this was helpful for data scientists to understand different steps in the process that they don't always see(DevOps, etc.). The author's definition of "data science" (page 4) is more similar to "big data" than "statistics," so beware that you're not going to get a lot of stats out of this book. Not bad at illustrating the concepts, but a bit too specific for the technology stack that was mentioned in the book. I thought this was helpful for data scientists to understand different steps in the process that they don't always see(DevOps, etc.). The author's definition of "data science" (page 4) is more similar to "big data" than "statistics," so beware that you're not going to get a lot of stats out of this book.

  8. 4 out of 5

    Vaidas

    Interesting ideas and quite detail explanation of implementation. I read this mainly for the description of the process and for hints how one might actually go about implementing all the steps. Book is clear on these points and therefore 5 stars. If one actually went on and ran the code probably something might not really work - but that's software :) I am actually into these ideas and will do my best to get DS process at my current employer as close as is practical to this. Interesting ideas and quite detail explanation of implementation. I read this mainly for the description of the process and for hints how one might actually go about implementing all the steps. Book is clear on these points and therefore 5 stars. If one actually went on and ran the code probably something might not really work - but that's software :) I am actually into these ideas and will do my best to get DS process at my current employer as close as is practical to this.

  9. 4 out of 5

    Dan Ryan

  10. 5 out of 5

    ?

  11. 4 out of 5

    Gilbert

  12. 4 out of 5

    Sidra

  13. 4 out of 5

    Glen Ritschel

  14. 5 out of 5

    Chris Garnett

  15. 4 out of 5

    gramakri

  16. 5 out of 5

    Patrick

  17. 4 out of 5

    Daan Tor

  18. 4 out of 5

    Igor Vieira

  19. 5 out of 5

    Alexander

  20. 4 out of 5

    Eugene

  21. 4 out of 5

    Oscar

  22. 5 out of 5

    Emerson Hernandez

  23. 5 out of 5

    Iurii

  24. 4 out of 5

    Antje

  25. 4 out of 5

    Jason

  26. 4 out of 5

    David

  27. 5 out of 5

    Gustav Lindqvist

  28. 4 out of 5

    Peter Roelants

  29. 4 out of 5

    Syafiq Firdaus

  30. 5 out of 5

    Leandro

Add a review

Your email address will not be published. Required fields are marked *

Loading...
We use cookies to give you the best online experience. By using our website you agree to our use of cookies in accordance with our cookie policy.